(PART) Group project

Intro

Welcome to the introduction to the group project!

Here you will find all important points related to the group project. In the following subchapters, you can find explanations for each step you should take in your group project. Generally, the group project comprises two parts. In the first part, you will work with your group on creating and conducting a survey. Once you have created a draft of your questionnaire, you will receive a feedback from us. After implementing the feedback, you will be ready to start collecting data! In the second part, you will have a chance to apply statistical knowledge acquired in the class and from online script. In fact, you will analyze the collected data, present your findings and finally submit your final report (R code and presentation).

Groups

Topics for the group project

Topic Description
Car-sharing vs. vehicle ownership Develop a questionnaire to explore the attractiveness of car sharing options for consumers (e.g., Car2go). Are consumers willing and planning to substitute a personal vehicle through car sharing option? Is car sharing likely to affect the amount of driving? Which factors influence these decisions?
Student canteen (Mensa) and the WU campus* Develop a questionnaire to measure students‘ attitudes and its drivers (e.g., quality of meals, price, etc.) toward the canteen and other restaurantson the campus.
Privacyin social media –consumers’ willingness to switch to a secure messaging service Develop a questionnaire to measure consumers’ willingness to switch from WhatsApp to a secure messaging service (e.g., Threema). What are the main motives (e.g., security concerns, costs, usability, etc.), and are consumers willing to pay for the secure service provider?
Consumer preferences for fair-trade products in the apparel industry Develop a questionnaire to measure consumers’ preferences for sustainable brands and eco fashion. Conduct an experiment to determine whether there are different perceptions regarding the “Fair Trade” effect.
The climate debate and green consumption The climate debate is currently on the top of the agenda of many news outlets. Some public figures that strongly favor one side dominate and emotionalize the debate (e.g., Greta Thunberg, DonaldTrump). Explore in how far consumers are willing to change their behavior (e.g.,cut air-travel) to help protect the environment. What factors influence the willingness to change (e.g., social factors, convenience)?
Self-driving cars Companies such as Google heavily invest in the development of self-driving cars. Develop a questionnaire to measure consumers attitude and usage intention for self-driving cars. What are the drivers and deterrents of the consumers’ willingness to adopt this innovation?
Freemium businessmodels in the music industry Many music streaming services (e.g., Spotify) offer a baseline version free of charge to consumers but charge for a premium version with additional features. Develop a questionnaire to measure consumers’ attitude towards legal music streaming providers.What factors influence the attitude(e.g., occupation, gender, usage behavior etc.), and how could companies motivate consumers to convertto the premium version of the service?
Consumers’ attitude towards legal video streaming providers and piracy Video streaming providers like Netflix record a continuous increase in registered users. On the other hand, illegal video streaming portals (e.g., Popcorn Time) are heavily used by other consumers. Develop a questionnaire to measure consumers’ attitude and drivers (e.g. occupation, gender, usage behavior etc.) towards legal video streaming providers. What could be reasons for piracy?
Impact of Coronavirus on online shopping services More and more people use online shopping services (e.g., Amazon Fresh, BillaOnline) to buy groceries since Coronavirus started spreading. Develop a questionnaire to measure consumers’ attitude and its drivers (e.g., price, service) towards the online services during and before Coronavirus outbreak. Are consumers less price sensitive when shopping online than in the offline stores? How likely are consumers to continue using online shopping services in the future?
Implications of Coronavirus on brand loyalty After outbreak of Coronavirus many consumers strayed from normal shopping patterns and began stockpiling products. Many retailers were working around the clock with their selling partners to ensure availability on all of our products, and bring on additional capacity. How did it affect brand loyalty? Did consumers try out alternative brands? If so, by what factors were consumers driven to change the brand they used before?
Consumers’ willingness to pay for organic products Develop a questionnaire to measure consumers’ willingness to pay for organic products (e.g., milk). How much are consumers willing to pay for organic milk vs. conventional milk? What is the observed price premium? How does this vary across consumers? What are the drivers? Does it reflect a desire to achieve better health, eat better quality food, or to contribute to environmental protection?
The most liveablecity in the world Vienna is frequently listed as one of the most liveablecities in the world (e.g., by the Economist Intelligence Unit). Develop a questionnaire to investigate the reasons why Vienna ranksso high in different rankings. What are the factors that contribute to its image? Are there differences between different groups of people?

Part 1: Before collecting data

An aim of this course is to develop your ability to translate business problems into actionable research questions and to design an adequate research plan to answer these questions. Therefore, you need to be equiped with knowledge on how to create a survey and properly conduct a research.

Generally, what you can expect from the survey design is similar to what one experiences in a relationship. If you try to take more than you commit, it doesn’t work out. Now on a serious note, if you follow guidelines mentioned here, you will certainly avoid usual traps your fellow collegues were caught in.

In a research process, conducting a survey is a part of (primary) data collection. Before we collect data, we have to make sure that preceding steps are correctly done. However, in the following sections we will focus on the process of designing a questionnaire. Eventually, you will be able to collect relevant data and apply appropriate statistical tests.

Note that this assignment may require you to deal with and integrate knowledge that has not yet been covered in class! Students are expected to read ahead and collect additional information to the extent to which their project requires this.

Research design

As you aim to conduct a real marketing research, before you start writing down questions for a questionnaire, you need to come up with a research design. In particular, you should review the research questions, hypotheses and characteristics that influence the research design.

If you are interested in the causal effect of one particular (independent) variable on another (dependent) variable, think about an experimental design that might allow you to manipulate this variable. In this case, you particularly have to decide on the following:

  • Which variable to manipulate?
  • Whether to use a between-subjects or within-subjects design?
  • The cause-effect sequence (the cause must occur before the effect)
  • The number of experimental conditions
  • Potential interactions and relationships with other variables (does the effect depend on another variable?)

What you need to be careful about is the effect of reversed causation. The effect refers to the situation where the causal relationship could possible have an opposite direction from what we assumed at the first place. For instance, it is often assumed that an increase in individual income leads to increase in well-being (happiness). However, some researches suggest that this causation could have an opposite direction, i.e. that actually increase in well-being of an individual leads to an increase in income.

Here are some examples of causal research design applications:

  • To assess how a product’s country-of-origin impacts attractiveness across different countries.
  • To analyse the effects of rebranding on customer loyalty.

If you would like to analyze the effects of multiple categorical or continuous (independent) variables on one continuous (dependent) variable, you might use a regression model. When doing this, you particularly have to decide on:

  • How to measure the dependent variable (DV). This is particularly important, since you need a variable that is powerful in uncovering variation between subjects (e.g., open-ended questions, such as “How much are you willing to pay for this product” are good candidates). Moreover, you also need to consider the nature of your DV,i.e. whether it is an interval variable, ordinal or categorical variable. The nature of your DV will heavily influence your choice of a correct statistical test.

  • How to measure the independent variables (IV) (single-item vs. multi-item scales, categorical vs. continuous). Bear in mind that the nature of the IV, together with DV, affects your choice of a statistical test as well.

  • What other variables might cause the effect that you would like to investigate (to prevent omitted variable bias, i.e. variables that are not part of your model but still influence the dependent variable).

  • Potential interactions (e.g., is the effect of variable X stronger for group A vs. B?)

Survey method

In the next step you should review the type of survey method you will use.

At this point you need to think in which setting you aim to conduct your survey. For instance, should you do it in a face-to-face setting or rather online. Here you can find some advantages and disadvantages of online surveys:

Here is the list of the online tools you can use to conduct an online survey (usually for free):

For the purpose of this course, we suggest to use Qualtrics.

A questionnaire creation in Qualtrics starts with creation of a Qulatrics project. Each project consists of a survey, distribution record, and collection of responses and reports. There are three ways to create a questionnaire. First, you can create a new survey project from scratch. Second, you can create a new questionnaire from a copy of an existing questionnaire. Eventually, you can create from a template in your Survey Library, or from an exported QSF file.

In order to create a completely new questionnaire, you need to do the following:

Go to the Projects page by clicking the Qualtric XM logo or clicking Projects on the top-right.

Create new project by clicking the blue button on the right side.
In the “Create your own” section click on the survey button.

Enter a name for your survey and get started with a survey creation.

If you would like to create a new questionnaire on a basis of an already existing one, then you choose “From a Copy”. Subsequently, you need to indicate the questionnaire you would like to copy. Now you are good to go!

If there is a questionnaire in the Qualtrics Library you would like to use, then you need to choose “From Library”, and indicate one library name in the dropdown menu.

Some useful tips when creating a questionnaire in Qualtrics:

  • Add a progress bar so that respondents know how many pages are left (see “Look & Feel” menu in Qualtrics).

  • Remember to activate the “Force Response” field under “Validation Options” if you don’t want to allow respondents to skip questions.

  • Check the usability on mobile devices using the preview option (make sure the “Mobile friendly” option is checked).

Questionnaire

After you set up everything, you should develop 20 - 25 questions. However, there are some important objectives to keep in mind while developing a questionnaire:

  • Information you are primarily interested in (dependent variable)
  • Information which might explain the dependent variable (independent variables)
  • Other factors related to both dependent and independent factors
  • Who’s answering the questions?

If you have sorted out all answers on the previous questions, you are ready to start writing the content. Again, here are some important things to remember:

  • The purpose of the questionnaire
  • Why it is important for you and why it could be useful for the respondent
  • How long it should take to complete & the final date for a reply
  • Ask questions in a logical order & use the right type of questions
  • Aim for brevity & use simple language

Questionnaire and research design

The questionnaire design should be aligned with the research design! Therefore, in the following sections we will explain some suggested steps on how to approach questionnaire creation.

Let’s start with what is a questionnaire. A structured questionnaire is a research instrument designed to elicit specific information from a sample of a target population. Usually it is used in a standardized way with fixed-alternative questions (same questions and response options for all respondents).

An objective of a questionnaire is threefold:

  • to translate the information need into a set of specific questions that the respondent can and will answer,
  • to motivate, and encourage respondents to become involved, to cooperate, and to complete the questionnaire,
  • to minimize response error.

Content in a questionnaire

In this step you are starting to work on the content of you questions.

At the beginning of the questionnaire you should give a brief introduction to your respondents in the context of your research and the content of the questionnaire. Try to use simple language and avoid technical terms. Additionally, in the introduction you should state how long the survey will approximately take.

When you start thinking about the questions to ask, there are several points to consider:

  • Is the question necessary?
  • Will I obtain the needed information?
  • Are several questions needed instead of one?
  • What type of data can I collect by asking that question (categorical or continuious)?

In your survey try to avoid asking double-barrelled questions.Those are a single question that attempts to cover two issues. Such questions can be confusing to respondents and result in ambiguous responses. Instead, you might ask multiple questions in order to obtain the inteded information.

\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
don't know how to handle 'block' engine output

Do you think Nike Town offers better variety and prices than other Nike stores?

\vspace{-0.10in}
\vspace{-0.10in}
don't know how to handle 'block' engine output
\vspace{-0.1in}
Correct
\vspace{-0.1in}
don't know how to handle 'block' engine output

Do you think Nike Town offers better variety than other Nike stores?
Do you think Nike Town offers better prices than other Nike stores?

\vspace{-0.10in}
\vspace{-0.10in}
don't know how to handle 'block' engine output

Inability and unwillingness to answer

The quality of collected data you highly depends on your ability to address correct participants. Therefore, you need to make sure that your respondents are able to meaningfully answer your questions.

Examples:

  • Not every household member might be informed about monthly expenses for groceries purchases if someone else makes these purchases.
  • Use filter questions that measure familiarity and product use.
  • Include a “don’t know” option.
  • If you ask participants for monteray values (e.g. how much are you ready to pay for the XY product?) across several EU, make sure you indicate correct currency (e.g. HRK for Croatia or HUF for Hungary).
  • Think about how mobile friendly is the layout of your survey (if it is an online survey).
  • Good case practices suggest that there should not be more than 2 questions per page (for online surveys displayed on mobile phones).

If you are asking participants to recall certain brands for instance, make sure you use unaided recall question:

\vspace{-0.1in}
Correct
\vspace{-0.1in}
don't know how to handle 'block' engine output

What brands of soft drinks do you remember being advertised on TV last night?

\vspace{-0.10in}
\vspace{-0.10in}
don't know how to handle 'block' engine output
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
don't know how to handle 'block' engine output

Which of these brands were advertised last night on TV?
a) Coca-Cola
b) Pepsi
c) Red Bull
d) Evian
e) Don’t know

\vspace{-0.10in}
\vspace{-0.10in}
don't know how to handle 'block' engine output

If you are asking participants to list something, the good case practice is to minimize the effort required by respondents:

\vspace{-0.1in}
Correct
\vspace{-0.1in}
don't know how to handle 'block' engine output

Please check all the departments from which you purchased merchandise on your most recent shopping trip to a department store:
a) Women’s dresses
b) Men’s apparel
c) Children’s apparel
d) Cosmetics
e) Jewelry
f) Other (please specify) ___________

\vspace{-0.10in}
\vspace{-0.10in}
don't know how to handle 'block' engine output
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
don't know how to handle 'block' engine output

Please list all the departments from which you purchased merchandise on your most recent shopping trip to department store X.

\vspace{-0.10in}
\vspace{-0.10in}
don't know how to handle 'block' engine output

In a case you are asking for information that could be considered sensitive (e.g. money, family life, political beliefs, religion), they should come at the end of the questionnaire. Moreover, it is recommendable to provide response categories rather than asking for specific figures:

\vspace{-0.1in}
Correct
\vspace{-0.1in}
don't know how to handle 'block' engine output

Which one of the following categories best describes your household’s annual gross income?
a) under 25.001 €
b) 25.001€ to 50.000 €
c) 50.001€ to 75.000 €
d) 75.001€ to 100.000 €
e) over 100.000 €

\vspace{-0.10in}
\vspace{-0.10in}
don't know how to handle 'block' engine output
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
don't know how to handle 'block' engine output

What is your household’s exact annual income?

\vspace{-0.10in}
\vspace{-0.10in}
don't know how to handle 'block' engine output

Decide on measurement scales and scaling techniques

Every statistical analysis requires that variables have a specific levels of measurement. Measurement scales you choose for your questions in a survey will affect the answers you get and eventually statistical test you can apply. For instance, it would not make sense to compute an average of genders. An average of a categorical variable does not make much sense. Moreover, if you tried to compute the average of genders defined in numeric values (e.g. male=0, female=1), the output would be interpretable.

It is crucial to become familiar with possibilities of each scale before you choose to add another question to your survey. Consequently, chances to obtain data you did not intend to collect and chances that you will not be able to apply tests you intended are significantly lower.

In the following table you can get a quick overview of possibilities per each measurement scale. :

In the table below you can find general procedure for choosing a correct analysis based on the measurement scale of your data and number of variables. It shows statistical analyses we covered during the course and aims to help you choose among them based on the nature of dependent variables on the side, and the nature and the number of your independent variables on the other side:

It is highly recommended to think about what type of data you want to collect and what test to use, before you form a question and add to the survey. We highly recommend you NOT to add questions without thinking what type of data you are going to collect with it. If you do so, you may end up with data you did not want to collect, and moreover, with data unsuitable for the test you intended to use.

Here you can find extremely nice overview of statistical test associated with different types of variables: CHOOSING THE CORRECT STATISTICAL TEST - UCLA

The most frequent types of questions

Here we want to show you the most frequent types of questions students use and what type of data can be collected by using them.

Attaching package: 㤼㸱janitor㤼㸲

The following objects are masked from 㤼㸱package:stats㤼㸲:

chisq.test, fisher.test

Parsed with column specification: cols( .default = col_double(), StartDate = col_character(), EndDate = col_character(), IPAddress = col_logical(), RecordedDate = col_character(), ResponseId = col_character(), RecipientLastName = col_logical(), RecipientFirstName = col_logical(), RecipientEmail = col_logical(), ExternalReference = col_logical(), LocationLatitude = col_character(), LocationLongitude = col_character(), DistributionChannel = col_character(), UserLanguage = col_character(), Q7_MC_sa_country_3_TEXT = col_logical(), Q23_Gender_3_TEXT = col_logical(), Condition = col_character() ) See spec(…) for full column specifications.

Multiple choice question

Multiple Choice with a single answer is a type of closed-ended question that lets respondents select one answer from a defined list of choices.Type of data you obtain is categorical.

Statistical test that you can think of when analysing categorical data:

  • Fisher’s exact test
    • Used when frequency in at least one cell is less than 5 . When frequencies in each cell are greater than 5, Chi-square test should be used
    • 1 dependent variable and 1 independent variable with 2 or more levels/factors
    • Hypothesis: Is there a significant difference in frequencies between values observed in cells and values expected in cells
  • Chi-square test
    • Goodness of fit: when you only have 1 dependent variable and none independent variables
      • Hypothesis: Is there a significant difference in frequencies between values observed in cells and values expected in cells ?
    • Chi-Square Test of Independence: when you have 1 dependent variable and 1 independent variable with 2 or more levels/factors.
      • Hypothesis: Is there an association between categorical variable X and categorical variable Y?
  • Binomial logistic regression
    • Used when you have an independent variable of at least interval scale and dependent variable is a categorical variable that can take on exactly two values (1 or 0, i.e., yes or no).
  • Categorical variables can be used as predictors in regression (as dummy variables).

It is important to distinguish multiple choice questions with single and multiple answers (which will be presented later) as their analysis looks differently.

For the analysis of results collected with multiple choice question with multiple possible answers, we can use Cochran’s Q test. Although we did not mention it before, it is not too different from what you have already learned about other tests.

The Cochran’s Q test and associated multiple comparisons require the following assumptions:

  1. Responses are dichotomous and from k number of matched samples.
  2. The subjects are independent of one another and were selected at random from a larger population.
  3. The sample size is sufficiently “large”. (As a rule of thumb, the number of subjects for which the responses are not all 0’s or 1’s, n, should be ≥ 4 and nk should be ≥ 24)

Rank order question

A rank order question asks respondents to compare items to each other by placing them in order of preference. Note that the data obtained from a rank order question shows an order of a respondent’s preference, but not the difference between items. For instance, if it turns out that the most important feature of a fitness tracker for a respondent XY is “Measuring steps” and the second most important feature “Calories burned”, we don’t know for how much more important is the former one in comparison to the latter one.

In order to analyze results from a rank order question, we use Friedman rank sum test.

Friedman rank sum test is used to identify whether there are any statistically significant differences between the distributions of 3 or more paired groups. It is used when the normality assumptions for using one-way repeated measures ANOVA are not met. Another case when Friedman rank rum test is used is when the dependent variable is measured on an ordinal scale, as in our case.

Constant Sum question

If you wish to obtain information about how much one attribute is preferred over another one, you may use a constant sum scale. The total box should always be displayed at the bottom to make it easier for respondents. A constant sum question permits collection of ratio data type. With data obtained we would be able to express the relative importance of the options.

With the data collected we are able to answer the question: what factor is the most important for our respondents when they go out for a dinner?

In order to answer this question we need to conduct a repeated measures ANOVA.

This type of ANOVA is used for analyzing data where the same subjects are measured more than once. In our case we have every respondent measured on each of the factors (locations, price, ambience and customer service). Repeated measures ANOVA is an extension of the paired-samples t-test. This test is also referred to as a within-subjects ANOVA. In the within-subject experimental design the same individuals are measured on the same outcome variable under different time points or conditions.

Text or number entry question

A text or number entry question is a recommended type of question if you are interested in obtaining ratio data type. We will use this type of question together with a constant sum question type to collect data that can be analysed with regression analysis. Note that in this case we treat constant sum data as ratio data and therefore assume that 0 means complete absence.

Scaling techniques

When it comes to scaling techniques, they are meant to study the relationship between objects. The basic scaling techniques classification is on comparative and non-comparative scales.

The noncomparative scale each object is scaled independently of the other objects. The resulting data is supposed to be measured in an interval and ratio scaled.

Comparative scales (or nonmetric scaling) compare direclty the stimulus object. For example, the respondent might be asked directly about his preference between domestic and foreign beer brands. As a result, the comparative data collected can only be interpreted in relative terms. In the following sections we will walk through both types of comparative scales and briefly introduce them.

Comparative scale: Paired Comparison

  • Respondent is presented with two objects and asked to select one according to some criterion.
  • The nature of resulting data is ordinal
  • Assumption of transitivity (if X > Y and Y > Z, then X > Z) enables the paired comparison data to be converted into a rank order. To do so, you need to indetify the number of times the object is preferred by adding up all the matrices.
  • Effective when the number of objects is limited as it requires the direct comparison, and a bigger number of objects makes the comparison becomes unmanagable.
  • Example:
    For each pair, please indicate which of the two brands of beer in the pair you prefer.

Comparative scale: Rank Order

  • Allow a certain set of brands or products to be simultaneously ranked based upon a specific attribute or characteristic.
  • The rank order scaling is a good proxy for to the shopping setting as there are simultaneous comparisons of objects.
  • The rank order scaling results in the data of ordinal nature.
  • Example:
    Rank the various brands of beer in order of preference. Begin by picking out the one brand that you like most and assign it a number 1. Then find the second most preferred brand and assign it a number 2. Continue this procedure until you have ranked all the brands of beer in order of preference. No two brands should received the same rank number.

Comparative scale: Constant sum

  • Respondents allocate a constant sum of units (e.g., points, dollars) among a set of stimulus objects with respect to some criterion.
  • Constant sum is similar to rank order, but it carries specific units.
  • The resulting data does not just indicate important factors, but also by how much a factor supersedes another one.
  • Constant sum scaling can be used to observe the comparative significance respondents assigned to various factors of a subject.
  • Example:
    There are 8 attributes of bottled beers. Please allocate 100 points among the attributes so that your allocation reflects the relative importance you attach to each attribute.

  • Basic analysis of constant-sum data involves tabulation of responses and presenting them as either quantities (e.g., “on average, 7 points were allocated to”high alcohol level“), or, as proportions (”On average, 7% of points were allocated to “high alcohol level”).

Non-Comparative Scales: Continuous Rating Scales

  • Participants rate the objects by placing a mark at the appropriate position on a line that runs from one extreme of the criterion variable to the other.
  • One of the advantages of the continuous rating scale is that it is easy to administer.

  • Once the ratings are collected, you can splits up the obtained ratings into categories and then assign those depending on the category in which the ratings fall.

Non-Comparative Scales: Itemized Rating Scales

  • The respondents are provided with a scale that has a number or brief description associated with each category.
  • The categories are ordered in terms of scale position, and the respondents are required to select the specified category that best describes the object being rated.
  • The commonly used itemized rating scales are the Likert, semantic differential and Stapel scales.
Itemized Rating Scales: Likert scale
  • Requires respondents to indicate their attitude towards the given object through the degree of agreement or disagreement with each of a series of statements within typically five or seven categories.
  • Reversed code of some items increases validity.
  • One limitation is time required to answer a question on a Likert scale. Compared to other itemized scaling techniques, Likert scale is more time consuming as each respondent is required to read every statement given in a questionnaire before assigning a numerical value to it.

In the table below you can find a couple of commonly measured constructs in marketing research such as attitude, importance, purchase intention and similar.

Itemized Rating Scales: Semantic Differential
  • Typically, participants rate objects on a number of itemized, seven-point rating scales bounded at each end by one of two bipolar adjectives.

  • Semantic differential can measure respondent attitudes towards something (products,concepts, items, people…).

  • It helps you find the respondent’s position is on a scale between two bipolar adjectives such as “Sweet-Sour” or “Bright-Dark”. In comparison to Likert scale, which uses generic scales (e.g. extremely dissatisfied to extremely satisfied), semantic differential questions are posed within the context of evaluating attitudes.

  • Widely used rating scale in marketing research due to its versatility

When creating a semantical difference question, you should consider the following:

  • Number of categories:

  • Balanced vs. unbalanced:

  • Odd/even number of categories:

  • Forced vs. non-forced response

  • Verbal description:

Questionnaire structure

The sequence of questions in a questionnaire could play important role. For instance, more sensitive questions (such as demographic-related questions) are usually placed at the end as they can trigger change in respondent’s behavior.

If you plan to conduct an online survey, then you need to think about the respondent’s experience while doing your questionnaire. For instance, spread the content over more short pages and do not have fewer long pages. In online surveys, two questions on one page is a useful rule of thumb. Generally, respondents are reluctant to read and fill out long questionnaire pages. Hence, long pages will lead to a higher dropout rate. In order to reduce dropout rate state how long the survey will approximately take in the introduction of the questionnaire. Take into account that tools like Qualtrics provide the estimated response time in the survey overview.

Consider that the most of people usually use their phones to fill it out. Think about how the questionnaire will appear on a phone screen too. In that regard, think of length of questions especially.

In the end, the questionnaire structure has to be aligned with the research design. For example, if your research design features an experiment, this needs to be reflected in the questionnaire (e.g., you need to assign the respondents randomly to the experimental conditions in case of a between-subjects comparison).

Questionnaire structure for a between-subjects design

In a between-subject design you randomly assign each respondent to different experimental conditions. They would then complete tasks only in the condition to which they are assigned.

For instance, we would like to test the effect of two advertisements on purchase intention. Therefore, one group of (randomly assigned) respondents will be exposed to one advertisement version while the other group (of randomly assigned respondents) will be exposed to another version. After that, both groups of respondents should express their willingness to buy the advertised product. Evenutally, if the dependent variable (e.g. willingness to buy) is measured on interval or ratio scale, then you can use independent t-test to compare group means. The whole experimental design should be organised as following:

Questionnaire structure for a within-subjects design

This type of experimental design involves exposing each respondent to all of the user experimental conditions you’re testing. This way, each respondent will test all of the conditions.

For instance, we would like to test again the effect of two advertisements on purchase intentions, but this time in a within-subject design. First, each respondent will be exposed to the first version of advertisement and right after that asked to rate his/her willingness to buy the advertised product. Subsequently, each participant will be shown another version of advertisement and again rate his/her willingness to purchase the advertised product. Finally, we can compare group means with paired sample t-test (given that data is measured on interval or ratio scale).

Question wording

Generally, question wording should enable each respondent to understand questions and to be able to answer them with reliability. Reliability means that, if a respondent was asked the same question again, he/she would give the same answer again. A number of common problems regarding the question wording have been identified, so we will address the most important ones.

In order to ensure reliability, the issue in terms of who, what, when and where should be defined in each question.

\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
don't know how to handle 'block' engine output

Example: Which brand of shampoo do you use?
Who (the respondent): It is not clear whether this question relates to the individual respondent or the respondent’s total household.
What (the brand of shampoo): It is unclear how the respondent is to answer this question if more than one brand is used.
When (unclear): The time frame is not specified in this question. The respondent could interpret it as meaning the shampoo used this morning, this week, or over the past year.
Where (not specified): At home, at the gym? Where?

\vspace{-0.1in}
\vspace{-0.1in}
don't know how to handle 'block' engine output
\vspace{-0.1in}
Correct
\vspace{-0.1in}
don't know how to handle 'block' engine output

A more clearly defined question is:
Which brand or brands of shampoo have you personally used at home during the last month? In the case of more than one brand, please list all the brands that apply.

\vspace{-0.1in}
\vspace{-0.1in}
don't know how to handle 'block' engine output

Use ordinary words. Words should match the vocabulary level of the participants.

\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
don't know how to handle 'block' engine output

“Do you think the distribution of soft drinks is adequate?”

\vspace{-0.1in}
\vspace{-0.1in}
don't know how to handle 'block' engine output
\vspace{-0.1in}
Correct
\vspace{-0.1in}
don't know how to handle 'block' engine output

“Do you think soft drinks are easily available when you want to buy them?”

\vspace{-0.1in}
\vspace{-0.1in}
don't know how to handle 'block' engine output

Avoid double negative form. Double negative question forms can confuse respondents, especially when they need to answer with “Agree” or “Disagree”.

\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
don't know how to handle 'block' engine output

Do you think that it is not uncommon that boys play basketball?

\vspace{-0.1in}
\vspace{-0.1in}
don't know how to handle 'block' engine output
\vspace{-0.1in}
Correct
\vspace{-0.1in}
don't know how to handle 'block' engine output

In your opinion, is it common that boys play basketball?

\vspace{-0.1in}
\vspace{-0.1in}
don't know how to handle 'block' engine output

Avoid leading questions.Leading questions clue the participant to what the answer should be. Such questions introduce a bias in a particular direction.

\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
don't know how to handle 'block' engine output

“Is Colgate your favorite toothpaste?”

\vspace{-0.1in}
\vspace{-0.1in}
don't know how to handle 'block' engine output
\vspace{-0.1in}
Correct
\vspace{-0.1in}
don't know how to handle 'block' engine output

“What is your favorite brand of toothpaste?”

\vspace{-0.1in}
\vspace{-0.1in}
don't know how to handle 'block' engine output

Avoid ambiguous words. Words such as usually, normally, frequently, often, regularly, and other similar words, do not define frequency clearly enough.

\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
don't know how to handle 'block' engine output

“In a typically month, how often do you go to a movie theater to see a movie?”
a) Never
b) Occasionally
c) Sometimes
d) Often
e) Regularly

\vspace{-0.1in}
\vspace{-0.1in}
don't know how to handle 'block' engine output
\vspace{-0.1in}
Correct
\vspace{-0.1in}
don't know how to handle 'block' engine output

“In a typically month, how often do you go to a movie theater to see a movie?”
a) Less than once
b) 1 or 2 times
c) 3 or 4 times
d) More than 4 times

\vspace{-0.1in}
\vspace{-0.1in}
don't know how to handle 'block' engine output

Choose adequate order

One of the last steps in a process of designing a questionnaire is choosing adequate order of questions and instructions for respondents.

At the beginning, you should provide a short and easy-to-understand introduction to the topic. Use simple language and avoid technical terms (e.g., not many people will know the terms “manufacturer brand” and “store brand”). Additionally, in the introduction you should state how long the survey will approximately take.

The opening questions should be interesting, simple and non-threatening. They are crucial because it is the respondent’s first exposure to the questionnaire and is likely to set the tone for the rest of questions in the questionnaire. If too difficult to understand, or sensitive in some way, respondents are likely to stop answering your questions. Qualifying questions (or screening questions) should serve as the opening questions (if applicable). Their purpose is to identify a potential respondent that is eligible to proceed with the research survey.

After the opening part, you should establish an optimal question flow. General questions should precede the specific questions. Questions on one subject, or one particular aspect of a subject, should be grouped together. It may feel confusing to be asked to return to some subject they thought they already gave their opinions about.

As respondents are moving towards the end of the questionnaire, they are likely to become increasingly indifferent and might give careless answers. Therefore, questions of special importance should ideally be included in the earlier part of the questionnaire.

Finally, you should pay particular attention to provide all prescribed definitions and explanations before you ask a question. This ensures that the questions are understood in consistent way by every respondent.

Test your questionnaire

Finally, before you distribute the final questionnaire, there are some things to consider. First, you should always pretest your questionnaire before sharing it! Test all aspects of the questionnaire (content, wording, sequence, form & layout, etc.). If possible, use respondents in the pretest that are similar to those who will be included in the actual survey. Ideally, the pretest sample size should be small (in a real scenario this could vary from 15 to 30 respondents; for the group project, a lower number will be sufficient). After each significant revision of the questionnaire, conduct another pretest, using a different sample of respondents. Eventually, code and analyze the responses obtained from the pretest so that you make sure that you collected information you intended to collect.

After testing your questionnaire you should be able to determine whether:

  • The questions are properly framed
  • The questions wording triggers any biases
  • The questions are placed in the optimal order
  • The questions are understandable
  • Specifying questions are needed or some need to be eliminated

Pitch, revision & submission

At this stage, you should be ready for pitching your questionnaire. Please keep in mind the timetable.

Date_A Time_A Date_B Time_B Task Chapters Link
Oct. 21 11:59PM Oct. 25 11:59PM * Submit questionnaire draft 10
Oct. 23* 09:00AM - 02:30PM Oct. 27* 02:00PM - 08:00PM * Coaching: Questionnaire design (live video coaching) 10 TBC
Nov. 1 11:59PM Nov. 4 11:59PM * Submit revised questionnaire 10
Note: Dates and times are indicated for groups A and B respectively.
Sessions indicated with '*' are group coaching sessions. Slots of 45 min. are assigned to each group within the indicated times.

Part 2: Collecting data and analysis

Data analysis

Parsed with column specification: cols( .default = col_double(), StartDate = col_character(), EndDate = col_character(), IPAddress = col_logical(), RecordedDate = col_character(), ResponseId = col_character(), RecipientLastName = col_logical(), RecipientFirstName = col_logical(), RecipientEmail = col_logical(), ExternalReference = col_logical(), LocationLatitude = col_character(), LocationLongitude = col_character(), DistributionChannel = col_character(), UserLanguage = col_character(), Q7_MC_sa_country_3_TEXT = col_logical(), Q23_Gender_3_TEXT = col_logical(), Condition = col_character() ) See spec(…) for full column specifications.

In this chapter we will encounter the nature of data you collect when conducting a survey. It will help you choose a type of a question depending on the nature of data you want to collect and on the type of statistical tests you want to apply.

Multiple choice with a single answer

Multiple Choice with a single answer is a type of closed-ended question that lets respondents select one answer from a defined list of choices.

Type of data you obtain is categorical, and the output comes in the following form:

What to do with this data now? First, we need to load it in R and prepare for analysis. The numbers you see in the output R recognizes as numeric. In order to conduct statistical modeling and properly visualize our results, we need to convert our data to a factor class.

A factor (or coding variable) represents different groups of data by using numbers (integers). In fact, factors appear as numeric variables, but they hold meaning of labels/names of data groups, i.e. nominal variable. These data groups are represented in a form of ‘levels’.
In our case, our multiple choice question output will contain 4 data groups (‘Grocery Store’, ‘Online shop’, ‘Specialised coffee shop’, ‘other’) after converting it to factor:

# Convert numeric value to factors
qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?' <- factor(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?', levels = c(1:5), labels = c('Never','1-2 hours','3-4 hours','5-6 hours','more than 6 hours'))

qualtrics$` Selected Choice_1` <- factor(qualtrics$` Selected Choice_1`,levels = c(1:2),labels = c("Male","Female"))

qualtrics$` Selected Choice` <- factor(qualtrics$` Selected Choice`, levels = c(1:2), labels=c("Austria","Germany"))


# Table
table(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?')

            Never         1-2 hours         3-4 hours         5-6 hours 
               19                18                22                35 
more than 6 hours 
               23 
table(qualtrics$` Selected Choice`)     #countries

Austria Germany 
     35      82 
table(qualtrics$` Selected Choice_1`)   #gender

  Male Female 
    49     68 

Second, you might want to visualize your results. In order to do so, the data format needs to be in the appropriate format.Here we proceed with data format adaptation from the point where we stopped:

# Converting long format to the visualisation-friendly format
mlc_visualisation <- as.data.frame(table(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?'))

# Naming columns
names(mlc_visualisation) <- c('Time','Count')

# Observing
knitr::kable(mlc_visualisation)

Time Count
Never 19
1-2 hours 18
3-4 hours 22
5-6 hours 35
more than 6 hours 23

NA

The simplest way to visualize data obtained from multiple choice question with a single answer is a bar chart:

## Basic bar chart
labels <- as.character(mlc_visualisation$Time) #Save labels for x-axis in the barplot
barplot(mlc_visualisation$Count, # Column to visualize
        xlab='Time', # X-axis label
        ylab = 'Count(answers)', # Y-axis label
        names.arg = labels,
        main = 'How many hours do you spend watching movies or series on Netflix?') # Title

R package ggplot2 allows you to create visually appealing graphs:

## ggplot2 bar chart
library(ggplot2)
p <- ggplot(data=mlc_visualisation, 
             aes(x=Time, y=Count, fill=Time)) +
             geom_bar(stat='identity') + theme_minimal() + labs(title = "In a typical week, how many hours do you spend watching movies or series on Netflix?")
p

Another R library which can help you make amazing interactive charts in a minute is plotly. Here we use a function called ggplotly(), which allows you to turn any ggplot2 chart interactive. Since we have already created a bar chart using ggplot2 and saved it as “p”, we will just turn it into plotly graph:

## ggplotly bar chart

library(plotly)
ggplotly(p)

An improved version of ggplot2 package is the packaged called ggvis, which is still in developing:

## ggvis bar chart

library(ggvis)
ggvis(mlc_visualisation, 
      x = ~Time, 
      y = ~Count, 
      fill=~Time)

Data type collected from the previous question is ordinal as we are able to make a natural order of the levels. Since it is ordinal data type, it belongs to categorical data. For the analysis of categorical data we can use Chi-square test or Fisher’s test if a count for some level is less than 5.

Fischer’s exact

Fisher’s exact test is used to test a hypothesis with data obtained from multiple choice questions with single answer. Results from multiple choice questions with multiple answers are treated with different test.
  • Application: when you have 1 dependent variable and 1 independent variable with 2 or more levels/factors
  • Used when frequency in at least one cell is less than 5 . When frequencies in each cell are greater than 5, Chi-square test should be used.
  • Hypothesis: Is there a significant difference in frequencies between values observed in cells and values expected in cells ? (R for Marketing and Research Analytics)
  • H0: There is no relationship between the two categorical variables.Therefore, two categorical variables are independent. Knowing the value of one variable does not help to predict the value of the other variable.
  • H1: There is a relationship between the two categorical variables.Therefore, two categorical variables are dependent.Knowing the value of one variable helps to predict the value of the other variable.
  • Usually, this type of test is used on 2x2 contingency tables. However, it can be applicable on contingency tables of larger dimensions.

Example: We would like to know whether a number of hours spent watching Netflix depends on the respondents’ country of origin.

# Creation of contingency table
fisher_test_table <-table(qualtrics$` Selected Choice`,qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?')
# Check how our contigency table looks like
fisher_test_table
         
          Never 1-2 hours 3-4 hours 5-6 hours more than 6 hours
  Austria     3         7         6        11                 8
  Germany    16        11        16        24                15
# Since we have a count less than 5, we should apply Fisher's test instead of Chi-square.

# Fisher's test
test <- fisher.test(fisher_test_table)
test

    Fisher's Exact Test for Count Data

data:  fisher_test_table
p-value = 0.575
alternative hypothesis: two.sided
# p-value
test$p.value
[1] 0.5750401

From the output and from test$p.value we see that the p-value is higher than the significance level of 5%. Like any other statistical test, if the p-value is higher than the significance level, we can not reject the null hypothesis.

In our case, not rejecting the null hypothesis for the Fisher’s exact test of independence means that there is no significant relationship between the two categorical variables. Therefore, knowing the value of one variable does not help to predict the value of the other variable.

Chi-square test: Goodness of fit & Independence test

  1. Goodness of fit
    • Application: when you only have 1 dependent variable and none independent variables
    • Hypothesis: Is there a significant difference in frequencies between values observed in cells and values expected in cells ?
    • H0: There is no significant difference between the observed and the expected frequencies.
    • H1: There is a significant difference between the observed and the expected frequencies.
    • If we don’t specify expected frequency per cell (see in the code below), then it is expected that all cells show an eqaul frequency.
    • Example :‘Do the numbers of respondents who are spending different amount of hours watching Netflix significantly differ from each other?
  • Note that we did not assume any specific distribution, so we are assuming that each count will have the same or similar number.
# Creating table 
(mlc_chi_square <- table(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?'))

            Never         1-2 hours         3-4 hours         5-6 hours 
               19                18                22                35 
more than 6 hours 
               23 
      
# Chi-square test (without given expected values = equal values )
chisq.test(mlc_chi_square)

    Chi-squared test for given probabilities

data:  mlc_chi_square
X-squared = 7.9145, df = 4, p-value = 0.09476

The p-value of the test is higher than 0.05. We can conclude that the numbers of respondents who spent different amount of hours watching Netflix are commonly distributed. Observed distribution does not differ significantly from the expected. This result does not surprise if you take a look at the values for each level in the table we created before conducting the test. There you can see that count of answers in each level is more or less not deviating too much. It is visible if you take a look at the previous visualisations as well.

If we are interested in testing more specific distribution, i.e. expect that 40% of our respondents are watching Netflix 3-4 hours, we can introduce corresponding distribution in the test.

# Expected values in percentages for each alternative. The sum must be 1.
expected_values <- c(0.10, # We expect that 10% of our respondents do not watch Netflix at all ("Never").
                     0.20, # We expect that 20% of our respondents watch Netflix 1-2 hours a week.  
                     0.40, # We expect that 40% of our respondents watch Netflix 3-4 hours a week.
                     0.20, # We expect that 20% of our respondents watch Netflix 5-6 hours a week.
                     0.10 # We expect that 10% of our respondents watch Netflix more than 6 hours a week.
                    )
# Chi-square test with expected values
chisq.test(mlc_chi_square, p=expected_values)

    Chi-squared test for given probabilities

data:  mlc_chi_square
X-squared = 35.607, df = 4, p-value = 3.486e-07

This time the p-value of the test is lower than 0.05.We have an evidence that observed distribution does significantly differ from the expected distribution (10%/20%/40%/20%/10%).

  1. Chi-Square Test of Independence
    • Application: when you have 1 dependent variable and 1 independent variable with 2 or more levels/factors
    • Hypothesis: Is there an association between categorical variable X and categorical variable Y?
    • H0: There is no association between the two variables.
    • H1: There is an association between the two variables.
    • Example: Is there an association between gender and the hours spent watching Neflix during a week?
# Creation of contingency table
chi_square_table <-table(qualtrics$` Selected Choice_1`,qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?')

# Chi-square independence test
chisq.test(chi_square_table)

    Pearson's Chi-squared test

data:  chi_square_table
X-squared = 1.5739, df = 4, p-value = 0.8135

Since the p-value (0.8135) is higher than the significance level (0.05), we cannot reject the null hypothesis. Thus, we conclude that there is no association relationship between gender and number of hours spent watching Netflix. Therefore, we can say that the hours spent is independent from the gender of participant.

Multiple choice with multiple answers

Before we conduct any test, we will do some simple calculations and visualise our data.

# Rename columns
colnames(qualtrics)[38] <- "ja!Naturlich"
colnames(qualtrics)[39] <- "Clever"
colnames(qualtrics)[40] <- "Spar Vital"
colnames(qualtrics)[41] <- "..."

# Replacing NA with 0
qualtrics$`ja!Naturlich`[is.na(qualtrics$`ja!Naturlich`)]=0
qualtrics$Clever[is.na(qualtrics$Clever)]=0
qualtrics$`Spar Vital`[is.na(qualtrics$`Spar Vital`)]=0
qualtrics$...[is.na(qualtrics$...)]=0

# Calculating frequency, percentage of respondents and percentage of cases
df.cochran <- data.frame(Frequnecy = colSums(qualtrics[38:41]),
                         Share_of_respondents = (colSums(qualtrics[38:41])/sum(qualtrics[38:41]))*100,
                                Share_of_cases =((colSums(qualtrics[38:41]))/nrow(qualtrics[38:41]))*100)
# Observing
df.cochran

# Visualisation
barplot(df.cochran[,3], names.arg = row.names(df.cochran), main = "% of Respondents familiar with brands", xlab = "Brand",ylab = "Percentage")

The visualisation above depicts the fact that more than 60% percent of people are familiar with the brand “ja!Naturlich”, while we can not say the same for other brands considered in our question.

For the analysis of results collected with multiple choice question with multiple possible answers, we can use Cochran’s Q test.Although we did not mention it before, it is not too different from what you have already learned about other tests.

The Cochran’s Q test and associated multiple comparisons require the following assumptions: 1. Responses are dichotomous and from k number of matched samples. 2. The subjects are independent of one another and were selected at random from a larger population. 3. The sample size is sufficiently “large”. (As a rule of thumb, the number of subjects for which the responses are not all 0’s or 1’s, n, should be ≥ 4 and nk should be ≥ 24)

In a within-subjects experiment design with three or more observations of a dichotomous(= just two levels such as “Yes” or “No”) categorical outcome, you utilize Cochran’s Q test to assess main effects.Similarly, in our multiple choice question with multiple answers we have the same respondent going through three or more potential answers with dichotomous(=yes or no) categorical outcome.

library(nonpar)

# Creation of matrix
#matrix.cochran <- cbind(qualtrics$`ja!Naturlich`,
#                   qualtrics$Clever,
#                   qualtrics$`Spar Vital`,
#                   qualtrics$`...`)
# Turning NAs to 0
#matrix.cochran[is.na(matrix.cochran)]=0

# Cochran test                   
#cochrans.q(matrix.cochran, alpha = 0.05)

The p-value less than 0.05 indicates that there is enough evidence to conclude that some of the store brands are better known among our respondents than other. In order to take a closer look at it, we need to conduct a post hoc test.

library(DescTools)
list.cochran <- list(qualtrics$`ja!Naturlich`,
                   qualtrics$Clever,
                   qualtrics$`Spar Vital`,
                   qualtrics$...) # imaginary brand

# Replacing NAs in the list with 0 in order to be able to run the test
list.cochran <- rapply(list.cochran, f=function(x) ifelse(is.na(x),0,x), how="replace" )

# Post hoc test (Dunn Test)
DunnTest(list.cochran, method="bonferroni")

 Dunn's test of multiple comparisons using rank sums : bonferroni  

    mean.rank.diff    pval    
2-1            -36  0.1093    
3-1            -18  1.0000    
4-1            -74 7.3e-06 ***
3-2             18  1.0000    
4-2            -38  0.0761 .  
4-3            -56  0.0014 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the results of the Dunn Test, we can see that there is a big difference between 1 (“ja!Natürlich”) and 4(“…”), as well as between 4(“…”) and 3(“Spar Vital”).

Rank order question

A rank order question asks respondents to compare items to each other by placing them in order of preference. Note that the data obtained from a rank order question shows an order of a respondent’s pereference, but not the difference between items. For instance, if the the most important feature of a fitness tracker for a respondendt XY is “Measuring steps” and the second most important feature “Calories burned”, we don’t know for how much more important is the former one in comparison to the latter one.

Intuitive question to ask is the following: which feature of the fitness tracker is the most important for our respondents?

We can answer this question by calculating a mean rank for each feature. Before we do so, we will create a separate data frame and add columns of the response data.

rank.data <- data.frame(qualtrics$` Measuring steps`,
                        qualtrics$` Calories burned`,
                        qualtrics$` Measuring heartbeat`,
                        qualtrics$` Exercise tracking`,
                        qualtrics$` Measuring distance`)
colnames(rank.data)<-c("Measuring steps","Calories burned","Measuring heartbeat","Exercise tracking","Measuring distance")

First information we would like to know is how many preference combinations there are, and how repetitive they are. We can obtain that information by creating a summary of the ranking data frame we created.

library(pmr)
Loading required package: stats4
test <- rankagg(rank.data)
test
                 n
 [1,] 2 1 3 4 5 10
 [2,] 1 3 2 4 5 19
 [3,] 2 3 1 4 5 17
 [4,] 1 2 4 3 5  4
 [5,] 4 2 1 3 5  3
 [6,] 3 2 1 5 4 15
 [7,] 1 3 5 2 4 10
 [8,] 1 2 4 5 3 10
 [9,] 2 4 1 5 3  9
[10,] 1 2 5 4 3  9
[11,] 5 4 3 1 2  3
[12,] 2 3 4 5 1  8

The matrix we received as an output is the summary of our ranking data. It shows that, for instance, the preference combination “2,1,3,4,5” repeats 10 times in the data frame. More specifically, it means that there are 10 respondents who prefer the item 2(“Calories burned”) the most, then the item 1(“Measuring steps”), and so on.

Now we can calculate the mean rank for each feature and conclude which feature is the most important to our respondents:

# Mean rank of each fitness tracker feature
destat(test)$mean.rank
Descriptive statistics of ranking data: 
$mean.rank: mean ranks; $pair: pairs; $mar: marginals
[1] 1.811966 2.581197 2.598291 4.051282 3.957265

As we can observe from the output, the item 1(“Measuring steps”) shows the best mean rank among all items. Therefore, we can assume that the “Measuring steps” is most important for our respondents. However, in order to statistically prove it and become sure that this is not just by mere chance, we can conduct Friedman rank sum test.

Friedman rank sum test is used to identify whether there are any statistically significant differences between the distributions of 3 or more paired groups. It is used when the normality assumptions for using one-way repeated measures ANOVA are not met. Another case when Friedman rank rum test is used is when the dependent variable is measured on an ordinal scale, as in our case.

Before we conduct the Friedman rank sum test, we will visualise our data:


Attaching package: 㤼㸱rstatix㤼㸲

The following object is masked from 㤼㸱package:janitor㤼㸲:

    make_clean_names

The following object is masked from 㤼㸱package:stats㤼㸲:

    filter

Registered S3 method overwritten by 'broom.mixed':
  method      from 
  tidy.gamlss broom
Registered S3 methods overwritten by 'lme4':
  method                          from
  cooks.distance.influence.merMod car 
  influence.merMod                car 
  dfbeta.influence.merMod         car 
  dfbetas.influence.merMod        car 
In case you would like cite this package, cite it as:
     Patil, I. (2018). ggstatsplot: "ggplot2" Based Plots with Statistical Details. CRAN.
     Retrieved from https://cran.r-project.org/web/packages/ggstatsplot/index.html
# We have just turned our data frame from the wide format to the long format by using function melt(). If we take a look at head and tail of our new data frame, we can see that it contains just two columns, "Rank" and "Feature".

rank.data.long <- reshape2::melt(rank.data,value.name = "Rank",variable.name = "Feature", stringsAsFactors=TRUE)
No id variables; using all as measure variables
attributes are not identical across measure variables; they will be dropped
tail(rank.data.long)
head(rank.data.long)

# Visualisation
ggstatsplot::ggwithinstats(
  data = rank.data.long,
  x = Feature,
  y = Rank,
  type = "np",
  pairwise.comparisons = TRUE, # show pairwise comparison test results
  title = "What features are important to you when evualting fitness trackers?")

Already from the advanced visualisation, that includes Friedman rank sum test and pairwise comparison, we can have an insight in significance of differences among features.

# Friedman test 
friedman.test(as.matrix(rank.data))

    Friedman rank sum test

data:  as.matrix(rank.data)
Friedman chi-squared = 176.42, df = 4, p-value < 2.2e-16

Friedman rank sum test has a p-value lower than 0.05, so we can conclude that here are significant differences between at least two features (what we have already seen in our visualisation). Even though we have identified differences between preferences towards features in our advanced visualisation, we will conduct a post hoc test in order to demonstrate traditional way of calculating pairwise comparisons.

knitr::kable(wilcox_test(Rank ~ Feature, paired = TRUE, p.adjust.method = "bonferroni", data = rank.data.long))
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
Rank Measuring steps Calories burned 117 117 1369.0 0.000000 0.000 ****
Rank Measuring steps Measuring heartbeat 117 117 2231.0 0.000753 0.008 **
Rank Measuring steps Exercise tracking 117 117 354.0 0.000000 0.000 ****
Rank Measuring steps Measuring distance 117 117 367.5 0.000000 0.000 ****
Rank Calories burned Measuring heartbeat 117 117 3214.5 0.512000 1.000 ns
Rank Calories burned Exercise tracking 117 117 610.5 0.000000 0.000 ****
Rank Calories burned Measuring distance 117 117 940.0 0.000000 0.000 ****
Rank Measuring heartbeat Exercise tracking 117 117 1235.0 0.000000 0.000 ****
Rank Measuring heartbeat Measuring distance 117 117 1307.5 0.000000 0.000 ****
Rank Exercise tracking Measuring distance 117 117 3534.5 0.816000 1.000 ns

The output table provides us with p-values referring to significance of difference in mean ranks of each pair. For instance, the first 4 rows proves that the differences between the mean rank of the feature “Measuring steps” and each of the rest of features are significant. Consequently, we can conclude that this feature is by far the most important among our respondents.

Another question that may be interesting to explore is whether there are any complementary features ? Or features which overlap each other in its functionality? In order to have a look at that, we can investigate the correlation between ranks assigned to each feature.

#Correlation Matrix
cor.matrix<-cor(rank.data, method=c('spearman'))
cor.matrix
                    Measuring steps Calories burned Measuring heartbeat
Measuring steps          1.00000000     -0.04651331          -0.6569094
Calories burned         -0.04651331      1.00000000          -0.2221626
Measuring heartbeat     -0.65690943     -0.22216264           1.0000000
Exercise tracking        0.29633223     -0.10838758          -0.3255840
Measuring distance      -0.05958032     -0.11694481          -0.3817895
                    Exercise tracking Measuring distance
Measuring steps             0.2963322        -0.05958032
Calories burned            -0.1083876        -0.11694481
Measuring heartbeat        -0.3255840        -0.38178948
Exercise tracking           1.0000000        -0.47176821
Measuring distance         -0.4717682         1.00000000

At the first glance we can observe a lot of negative values, meaning that many features correlate negatively relative to each other. In order to make the interpretation easier, we will try to visualise correlations in a form of a correlation matrix.

library(ggcorrplot)

Attaching package: 㤼㸱ggcorrplot㤼㸲

The following object is masked from 㤼㸱package:rstatix㤼㸲:

    cor_pmat
ggcorrplot(cor.matrix)

From the correlation matrix we can confirm that almost all features negatively correlate to each other. An exception is the relationship between feature “Measuring steps” and “Exercise tracking”, which correlates positively. This matrix can be useful for digging deeper in relationship between preferences for features. For instance, we can assume that feature “Measuring steps” and “Exercise tracking” correlate positively because users see them as complementary features. Moreover, if we say that walking is a type of exercise (in case of longer walking routes), we can assume that users, who ranked “Exercise tracking” high, ranked “Measuring steps” high as well, because they perceive it as another type of “Exercise tracking”.

Constant Sum question

If you wish to obtain information about how much one attribute is preferred over another one, you may use a constant sum scale. The total box should always be displayed at the bottom to make it easier for respondents. A constant sum question permits collection of ratio data type. With data obtained we would be able to express the relative importance of the options.

Constant Sum Question
Location Price Ambience Customer Service id
32 23 32 13 1
25 30 22 23 2
19 21 30 30 3
20 20 20 40 4
30 30 10 30 5
0 20 20 60 6
# Compute descriptive statistics
library(pastecs) 

Attaching package: 㤼㸱pastecs㤼㸲

The following objects are masked from 㤼㸱package:dplyr㤼㸲:

    first, last
res <- stat.desc(constant.sum)
round(res[,1:4],2)
# Creation of the long version of data frame
constant.sum.long <-melt(constant.sum[,-5], variable.name ="Factor" ,value.name = "Points")
constant.sum.long
# Boxplot ggplot2
p<-constant.sum.long %>% 
  filter(Factor!="id") %>%
  ggplot(aes(x=Factor, y=Points, fill= Factor)) +
    geom_boxplot()  +
    theme_minimal() +
    ggtitle("What factors do you consider when choosing a place to go for a dinner?") +
    xlab("")
ggplotly(p)

With the data collected we are able to answer the question: what factor is the most important for our respondents when they go out for a dinner?

library(robCompositions)
Loading required package: pls

Attaching package: 㤼㸱pls㤼㸲

The following object is masked from 㤼㸱package:stats㤼㸲:

    loadings

Loading required package: data.table
data.table 1.13.0 using 2 threads (see ?getDTthreads).  Latest news: r-datatable.com

Attaching package: 㤼㸱data.table㤼㸲

The following objects are masked from 㤼㸱package:pastecs㤼㸲:

    first, last

The following objects are masked from 㤼㸱package:reshape2㤼㸲:

    dcast, melt

The following object is masked from 㤼㸱package:DescTools㤼㸲:

    %like%

The following objects are masked from 㤼㸱package:dplyr㤼㸲:

    between, first, last

Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
sROC 0.1-2 loaded
constSum(constant.sum,100)[,-5]

In order to anwser this question we need to conduct a repeated measures ANOVA. This type of ANOVA is used for analyzing data where the same subjects are measured more than once. In our case we have every respondent measured on each of the factors (locations, price, ambience and customer service). Repeated measures ANOVA is an extension of the paired-samples t-test. This test is also referred to as a within-subjects ANOVA. In the within-subject experimental design the same individuals are measured on the same outcome variable under different time points or conditions.

We need to check all assumptions that need to be fulfilled in order to deploy this type of ANOVA. There are three assumputions that need to check. The first to check that each level of the independent variable is approximately normally distributed. Since we have more than 30 observations at each level, we do not need to proceed further due to the central limit theorem. Second assumption referrs to extreme outliers. Let’s have a look at potential outliers:

# Outliers
constant.sum.long %>% 
  group_by(Factor) %>%
  identify_outliers(Points)

As we cannot identify any extreme outliers, we can proceed with deploying repeated measures ANOVA.

# Formatting data 
constant.sum.aov <- gather(constant.sum, key = "Factor", value = "Points", ` Location`,` Price`,` Ambience`,` Customer Service`)
attributes are not identical across measure variables;
they will be dropped
# One-way repeated measures ANOVA  
res.aov <- anova_test(data = constant.sum.aov, dv = Points,wid = id ,within = Factor)
get_anova_table(res.aov)
ANOVA Table (type III tests)

  Effect  DFn    DFd      F        p p<.05   ges
1 Factor 2.56 297.36 33.668 1.06e-16     * 0.225
# Post hoc test
pairwise.t.test(constant.sum.long$Points,constant.sum.long$Factor, paired = T, p.adjust.method = "holm")

    Pairwise comparisons using paired t tests 

data:  constant.sum.long$Points and constant.sum.long$Factor 

                   Location  Price  Ambience
 Price            2.7e-15   -      -        
 Ambience         3.2e-10   0.030  -        
 Customer Service < 2e-16   0.742  0.079    

P value adjustment method: holm 

Now we can clearly see that our respondents consider price more than location, or ambience, while customer service is perceived almost equally important as prices.

ggstatsplot::ggwithinstats(
  data = constant.sum.long %>% filter(Factor!="id"), # excluding "id" column from the data
  x = Factor,
  y = Points,
  type = "p",
  pairwise.comparisons = TRUE, # show pairwise comparison test results
  title = "What factors do you consider when choosing a place to go for a dinner?")

Text or number entry question

A text or number entry question is a recommended type of question if you are interested in obtaining ratio data type. We will use this type of question together with a constant sum question type to collect data that can be analysed with regression analysis. Note that in this case we treat constant sum data as ratio data and therefore assume that 0 means complete absence.

Here is a glimpse in answers on how important is each factor to our respondents when it comes to dinning outside:

Constant sum question
Location Price Ambience Customer Service
32 23 32 1
25 30 22 43
19 21 30 34
20 20 20 46
30 30 10 17
0 20 20 4

Additionally, we asked our respondents how much are they willing to spend on dinner on average. In order to handle data easier, we will create a new data frame where we merge all the data together:

dinner <- subset(qualtrics, select = c(" Location"," Price"," Ambience"," Customer Service", " Willingness-to-pay (in EUR)"))
knitr::kable(head(dinner))
Location Price Ambience Customer Service Willingness-to-pay (in EUR)
32 23 32 1 29
25 30 22 43 77
19 21 30 34 52
20 20 20 46 31
30 30 10 17 22
0 20 20 4 35

Before we conduct a linear regression analysis, we need to take a look at correlation matrix:

correlation <-cor(dinner, method=c('pearson'))
correlation
                               Location       Price    Ambience
 Location                     1.0000000 -0.31732620 -0.36134355
 Price                       -0.3173262  1.00000000 -0.21962027
 Ambience                    -0.3613436 -0.21962027  1.00000000
 Customer Service            -0.1668810  0.08894752 -0.02405881
 Willingness-to-pay (in EUR)  0.1414540 -0.07438388 -0.32550607
                              Customer Service  Willingness-to-pay (in EUR)
 Location                          -0.16688104                   0.14145397
 Price                              0.08894752                  -0.07438388
 Ambience                          -0.02405881                  -0.32550607
 Customer Service                   1.00000000                   0.12125571
 Willingness-to-pay (in EUR)        0.12125571                   1.00000000

From our data we see, for instance, that some negative correlation between willingness to pay and importance of ambiance as well as some positive correlation between importance of customer service and willingness-to-pay. Let us observe descriptive statistics as well:

knitr::kable(psych::describe(dinner))
vars n mean sd median trimmed mad min max range skew kurtosis se
Location 1 117 12.14530 10.85823 10 11.25263 14.8260 0 40 40 0.3585257 -0.8903393 1.003844
Price 2 117 31.48718 16.22079 30 29.83158 14.8260 0 100 100 1.5662904 4.1917874 1.499613
Ambience 3 117 25.76068 13.97822 20 25.09474 14.8260 0 60 60 0.3807401 -0.3100357 1.292286
Customer Service 4 117 49.35897 29.47777 47 49.29474 40.0302 0 98 98 0.0342022 -1.1897398 2.725221
Willingness-to-pay (in EUR) 5 117 32.99145 26.26801 30 30.28421 29.6520 0 110 110 0.8007002 0.0124325 2.428479

We see that difference between mean and median does not suggest (at the first sight) great effect of outliers. Let us now do linear regression analysis:

mlr.dinner <- lm(` Willingness-to-pay (in EUR)` ~ ` Location` + ` Price` + ` Ambience`+` Customer Service`, data = dinner)
summary(mlr.dinner)

Call:
lm(formula = ` Willingness-to-pay (in EUR)` ~ ` Location` + ` Price` + 
    ` Ambience` + ` Customer Service`, data = dinner)

Residuals:
    Min      1Q  Median      3Q     Max 
-40.810 -18.205  -3.314  14.059  74.274 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         55.31553   11.57393   4.779 5.38e-06 ***
` Location`         -0.06739    0.25556  -0.264 0.792503    
` Price`            -0.28455    0.16117  -1.765 0.080205 .  
` Ambience`         -0.69755    0.19088  -3.654 0.000394 ***
` Customer Service`  0.10988    0.07931   1.386 0.168646    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 24.72 on 112 degrees of freedom
Multiple R-squared:  0.1449,    Adjusted R-squared:  0.1144 
F-statistic: 4.745 on 4 and 112 DF,  p-value: 0.001421

Out of all factors of importance when dinning out, the only one that suggests significance at 0.05 level of significance is ambience. From the summary we can conclude that increase in importance of ambience by 1 point, leads to decrease in willingness to pay by -0.697554.

confint(mlr.dinner)
                          2.5 %      97.5 %
(Intercept)         32.38327198 78.24779707
` Location`         -0.57374787  0.43897062
` Price`            -0.60389395  0.03479312
` Ambience`         -1.07575993 -0.31934814
` Customer Service` -0.04725424  0.26701295

From confidence intervals, We can conclude that when we do not consider any of given factors (location, price, ambience and customer service), willingness to pay will be somewhere between 32.383272EUR and 78.2477971EUR. Besides that, for each increase in importance of dinner ambiance by one point, there will be an average decrease of willingness to pay between -1.0757599 and -0.3193481.

ggcoefstats(x = mlr.dinner,
            title = "Willingness to pay predicted by importance of factors")
New names:
* NA -> ...1
* NA -> ...2
* NA -> ...3
* NA -> ...4

There are couple of things we need to consider when we do multiple linear regression. One of them are potential outliers in our data. Here we identify and visualize them:

# Outliers
outlier_values <- boxplot.stats(mlr.dinner$residuals)$out  # outlier values.
outlier_values
      12       44       49 
70.56037 64.19796 74.27359 

We identified observations that belong to outlier values. We can even visualize them too:

boxplot(mlr.dinner$residuals, main="Willingnes to pay", boxwex=0.1)

In addition, we need to observe whether there are any influential observations:

plot(mlr.dinner,4)

A rule of thumb to determine whether an observation should be classified as influential or not is to look for observation with a Cook’s distance > 1 .We see from the graph that there are no influential observations.

Another thing to consider is linearity, i.e. that the relationship between the dependent and the independent variable can be reasonably approximated in linear terms:

# Linear specification
library(car)
Loading required package: carData

Attaching package: 㤼㸱car㤼㸲

The following object is masked from 㤼㸱package:DescTools㤼㸲:

    Recode

The following object is masked from 㤼㸱package:dplyr㤼㸲:

    recode
avPlots(mlr.dinner)

In our example it does not seem that linear relationships can be reasonably assumed for all variables.

As we already learned, another important assumption of the linear model is that the error terms have a constant variance (i.e., homoscedasticity):

# Breusch-Pagan Test
library(lmtest)
Loading required package: zoo

Attaching package: 㤼㸱zoo㤼㸲

The following objects are masked from 㤼㸱package:base㤼㸲:

    as.Date, as.Date.numeric
bptest(mlr.dinner)

    studentized Breusch-Pagan test

data:  mlr.dinner
BP = 1.1478, df = 4, p-value = 0.8866

The null hypothesis for this test is that the error variances are all equal, and our result is insignificant. Therefore, this assumption is met.

Another assumption to be met is that the error term is normally distributed. One way to check for normal distribution of the data is to employ statistical with the null hypothesis that the data is normally distributed. One of these is a Shapiro–Wilk test:

shapiro.test(resid(mlr.dinner))

    Shapiro-Wilk normality test

data:  resid(mlr.dinner)
W = 0.94757, p-value = 0.0001763

When the assumption of normally distributed errors is not met (as it is not met in our case), this might again be due to a misspecification of your model, in which case it might help to transform your data.

Finally, we need to check for multicollinearity, the case when there is a strong linear relationship between the independent variables:

correlation <-cor(dinner, method=c('pearson'))
correlation
                               Location       Price    Ambience
 Location                     1.0000000 -0.31732620 -0.36134355
 Price                       -0.3173262  1.00000000 -0.21962027
 Ambience                    -0.3613436 -0.21962027  1.00000000
 Customer Service            -0.1668810  0.08894752 -0.02405881
 Willingness-to-pay (in EUR)  0.1414540 -0.07438388 -0.32550607
                              Customer Service  Willingness-to-pay (in EUR)
 Location                          -0.16688104                   0.14145397
 Price                              0.08894752                  -0.07438388
 Ambience                          -0.02405881                  -0.32550607
 Customer Service                   1.00000000                   0.12125571
 Willingness-to-pay (in EUR)        0.12125571                   1.00000000

By observing our correlation matrix, we can see that non of the coefficients suggest values close to 0.8 or 0.9. Consequently, we conclude that there are no concerns regarding the multicolinearity between independent variables.

Reporting

inttor

Presentation guidelines & grading

Your performance in this part will be evaluated based on the following criteria:

Final presentation and submission

---
output:
  pdf_document:
    toc: yes
  html_notebook: default
  html_document: 
    toc: yes
    df_print: paged
---
<link rel="stylesheet" type="text/css" media="all" href="style.css" />

```{r, include=FALSE, fig.cap="A structure of the group project"}
knitr::opts_chunk$set(echo = TRUE, error = FALSE, warning = FALSE, message = FALSE)
```

# (PART) Group project {-}

# Intro {-}

Welcome to the introduction to the group project!

<div style="text-align: justify">

Here you will find all important points related to the group project. In the following subchapters, you can find explanations for each step you should take in your group project. Generally, the group project comprises two parts. In the first part, you will work with your group on creating and conducting a survey. Once you have created a draft of your questionnaire, you will receive a feedback from us. After implementing the feedback, you will be ready to start collecting data! In the second part, you will have a chance to apply statistical knowledge acquired in the class and from online script. In fact, you will analyze the collected data, present your findings and finally submit your final report (R code and presentation).

</div>


```{r,echo=FALSE,out.width = '70%',fig.align='center',fig.cap="Structure of the group project"}
knitr::include_graphics("images/group_project.PNG")
```

## Groups {-}

## Topics for the group project {-}

```{r eval = TRUE, echo = FALSE, warning=FALSE, message = FALSE}
library(dplyr)
library(kableExtra)
mytable_sub = data.frame(
    Topic = c("Car-sharing vs. vehicle ownership",
             "Student canteen (Mensa) and the WU campus*",
             "Privacy and security in social media",
             "Consumer preferences for fair-trade products in the apparel industry",
             "The climate debate and green consumption",
             "Self-driving cars",
             "Freemium businessmodels in the music industry",
             "Consumers’ attitude towards legal video streaming providers and piracy",
             "Impact of Coronavirus on online shopping services",
             "Implications of Coronavirus on brand loyalty",
             "Consumers’ willingness to pay for organic products",
             "The most liveablecity in the world",
             "Design your own questionnaire on a topic of your choice"
             ),
    Description = c(
      "Develop a questionnaire to explore the attractiveness of car sharing options for consumers (e.g., Car2go). Are consumers willing and planning to substitute a personal vehicle through car sharing option? Is car sharing likely to affect the amount of driving? Which factors influence these decisions?",
      "Develop a questionnaire to measure students‘ attitudes and its drivers (e.g., quality of meals, price, etc.) toward the canteen and other restaurantson the campus.",
      "Develop a questionnaire to measure consumers’ willingness to switch from WhatsApp to a secure messaging service (e.g., Threema). What are the main motives (e.g., security concerns, costs, usability, etc.), and are consumers willing to pay for the secure service provider?",
      "Develop a questionnaire to measure consumers’ preferences for sustainable brands and eco fashion. Conduct an experiment to determine whether there are different perceptions regarding the “Fair Trade” effect.",
      "The climate debate is currently on the top of the agenda of many news outlets. Some public figures that strongly favor one side dominate and emotionalize the debate (e.g., Greta Thunberg, DonaldTrump). Explore in how far consumers are willing to change their behavior (e.g.,cut air-travel) to help protect the environment. What factors influence the willingness to change (e.g., social factors, convenience)?",
      "Companies such as Google heavily invest in the development of self-driving cars. Develop a questionnaire to measure consumers attitude and usage intention for self-driving cars. What are the drivers and deterrents of the consumers’ willingness to adopt this innovation?",
      "Many music streaming services (e.g., Spotify) offer a baseline version free of charge to consumers but charge for a premium version with additional features. Develop a questionnaire to measure consumers’ attitude towards legal music streaming providers.What factors influence the attitude(e.g., occupation, gender, usage behavior etc.), and how could companies motivate consumers to convertto the premium version of the service?",
      "Video streaming providers like Netflix record a continuous increase in registered users. On the other hand, illegal video streaming portals (e.g., Popcorn Time) are heavily used by other consumers. Develop a questionnaire to measure consumers’ attitude and drivers (e.g. occupation, gender, usage behavior etc.) towards legal video streaming providers. What could be reasons for piracy?",
      "More and more people use online shopping services (e.g., Amazon Fresh, BillaOnline) to buy groceries since Coronavirus started spreading. Develop a questionnaire to measure consumers’ attitude and its drivers (e.g., price, service) towards the online services during and before Coronavirus outbreak. Are consumers less price sensitive when shopping online than in the offline stores? How likely are consumers to continue using online shopping services in the future?",
      "After outbreak of Coronavirus many consumers strayed from normal shopping patterns and began stockpiling products. Many retailers were working around the clock with their selling partners to ensure availability on all of our products, and bring on additional capacity. How did it affect brand loyalty? Did consumers try out alternative brands? If so, by what factors were consumers driven to change the brand they used before?",
      "Develop a questionnaire to measure consumers’ willingness to pay for organic products (e.g., milk). How much are consumers willing to pay for organic milk vs. conventional milk? What is the observed price premium? How does this vary across consumers? What are the drivers? Does it reflect a desire to achieve better health, eat better quality food, or to contribute to environmental protection?",
      "Vienna is frequently listed as one of the most liveablecities in the world (e.g., by the Economist Intelligence Unit). Develop a questionnaire to investigate the reasons why Vienna ranksso high in different rankings. What are the factors that contribute to its image? Are there differences between different groups of people?",
      "Feel free to choose topic of your choice as well."
    ))

mytable_sub %>% kable(escape = T) %>%
  kable_paper(c("hover"), full_width = F)

```


# Part 1: Before collecting data {-}

<div style="text-align: justify">

An aim of this course is to develop your ability to translate business problems into actionable research questions and to design an adequate research plan to answer these questions. Therefore, you need to be equiped with knowledge on how to create a survey and properly conduct a research. 

Generally, what you can expect from the survey design is similar to what one experiences in a relationship. If you try to take more than you commit, it doesn’t work out. Now on a serious note, if you follow guidelines mentioned here, you will certainly avoid usual traps your fellow collegues were caught in.

In a research process, conducting a survey is a part of (primary) data collection. Before we collect data, we have to make sure that preceding steps are correctly done. However, in the following sections we will focus on the process of designing a questionnaire. Eventually, you will be able to collect relevant data and apply appropriate statistical tests.    

</div>


```{r,echo=FALSE,out.width = '70%',fig.align='center'}
knitr::include_graphics("research-process.PNG")
```

::: {.infobox_red .caution data-latex="{caution}"}
Note that this assignment may require you to deal with and integrate
knowledge that has not yet been covered in class! Students are
expected to read ahead and collect additional information to the
extent to which their project requires this.
:::


## Research design

<div style="text-align: justify">

As you aim to conduct a real marketing research, before you start writing down questions for a questionnaire, you need to come up with a research design. In particular, you should review the research questions, hypotheses and characteristics that influence the research design.  

If you are interested in the causal effect of one particular (independent) variable on another (dependent) variable, think about an experimental design that might allow you to manipulate this variable. In this case, you particularly have to decide on the following:  

* Which variable to manipulate?  
* Whether to use a between-subjects or within-subjects design?  
* The cause-effect sequence (the cause must occur before the effect)  
* The number of experimental conditions  
* Potential interactions and relationships with other variables (does the effect depend on another variable?)

What you need to be careful about is the effect of **reversed causation**. The effect refers to the situation where the causal relationship could possible have an opposite direction from what we assumed at the first place. For instance, it is often assumed that an increase in individual income leads to increase in well-being (happiness). However, some [researches](https://www.ncbi.nlm.nih.gov/pubmed/16949692) suggest that this causation could have an opposite direction, i.e. that actually increase in well-being of an individual leads to an increase in income.  

Here are some examples of causal research design applications:  

* To assess how a product's country-of-origin impacts attractiveness across different countries.  
* To analyse the effects of rebranding on customer loyalty.  

```{r,echo=FALSE, out.width = '70%',fig.align='center'}
knitr::include_graphics("causation-effect.png")
```


If you would like to analyze the effects of multiple categorical or continuous (independent) variables on one continuous (dependent) variable, you might use a regression model. When doing this, you particularly have to decide on:  

* How to measure **the dependent variable (DV)**. This is particularly important, since you need a variable that is powerful in uncovering variation between subjects (e.g., open-ended questions, such as "How much are you willing to pay for this product" are good candidates). Moreover, you also need to consider the nature of your DV,i.e. whether it is an interval variable, ordinal or categorical variable. The nature of your DV will heavily influence your choice of a correct statistical test.

* How to measure **the independent variables (IV)** (single-item vs. multi-item scales, categorical vs. continuous). Bear in mind that the nature of the IV, together with DV, affects your choice of a statistical test as well.  

* What other variables might cause the effect that you would like to investigate (to prevent omitted variable bias, i.e. variables that are not part of your model but still influence the dependent variable).

* Potential interactions (e.g., is the effect of variable X stronger for group A vs. B?)

</div>


```{r, echo=FALSE, out.width = '70%',fig.align='center'}
knitr::include_graphics("mlp-regression.png")
```

## Survey method  

In the next step you should review the type of survey method you will use.

At this point you need to think in which setting you aim to conduct your survey. For instance, should you do it in a face-to-face setting or rather online. Here you can find some advantages and disadvantages of online surveys:

```{r, echo=F, fig.align='center',out.width='50%'}
knitr::include_graphics("adv-disadv-online-questionnaire.png")
```

Here is the list of the online tools you can use to conduct an online survey (usually for free):  

- [Qualtrics](http://www.qualtrics.com/free-account/)
- [Google form](https://www.google.com/forms/about/)
- [Survey monkey](https://www.surveymonkey.com/)
- [Free online surverys](http://freeonlinesurveys.com/)
- [Kwik surveys](http://kwiksurveys.com/)

For the purpose of this course, we suggest to use **Qualtrics**.

A questionnaire creation in Qualtrics starts with creation of a Qulatrics project. Each project consists of a survey, distribution record, and collection of responses and reports. There are three ways to create a questionnaire. First, you can create a new survey project from scratch. Second, you can create a new questionnaire from a copy of an existing questionnaire. Eventually, you can create from a template in your Survey Library, or from an exported QSF file.

::: {.infobox .download data-latex="{download}"}
[Here you can find a template of a questionnaire in Qualtrics with guidelines and suggestions related to each question type.](./ExampleQuestionnaireQualtrics.qsf)
:::


In order to create a completely new questionnaire, you need to do the following:  

Go to the Projects page by clicking the Qualtric XM logo or clicking Projects on the top-right.  

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('create-new-project.png')
```

Create new project by clicking the blue button on the right side.  
In the "Create your own" section click on the survey button.

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('create-new-project-2.png')
```

Enter a name for your survey and get started with a survey creation.

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('new-survey.png')
```

If you would like to create a new questionnaire on a basis of an already existing one, then you choose "From a Copy". Subsequently, you need to indicate the questionnaire you would like to copy. Now you are good to go! 

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('survey-copy.png')
```

If there is a questionnaire in the Qualtrics Library you would like to use, then you need to choose "From Library", and indicate one library name in the dropdown menu. 

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('library-survey.png')
```

::: {.infobox_orange .hint data-latex="{hint}"}

Some useful tips when creating a questionnaire in Qualtrics:

* Add a progress bar so that respondents know how many pages are left (see "Look & Feel" menu in Qualtrics).

* Remember to activate the "Force Response" field under "Validation Options" if you don't want to allow respondents to skip questions.

* Check the usability on mobile devices using the preview option (make sure the "Mobile friendly" option is checked).
:::

## Questionnaire

After you set up everything, you should develop 20 - 25 questions. However, there are some important objectives to keep in mind while developing a questionnaire:

* Information you are primarily interested in (dependent variable)
* Information which might explain the dependent variable (independent variables)
* Other factors related to both dependent and independent factors
* Who’s answering the questions?

If you have sorted out all answers on the previous questions, you are ready to start writing the content. Again, here are some important things to remember:

* The purpose of the questionnaire
* Why it is important for you and why it could be useful for the respondent
* How long it should take to complete & the final date for a reply
* Ask questions in a logical order & use the right type of questions
* Aim for brevity & use simple language

### Questionnaire and research design {-}

The questionnaire design should be aligned with the research design! Therefore, in the following sections we will explain some suggested steps on how to approach questionnaire creation.

Let's start with what is a questionnaire. A structured questionnaire is a research instrument designed to elicit specific information from a sample of a target population. Usually it is used in a standardized way with fixed-alternative questions (same questions and response options for all respondents).

An objective of a questionnaire is threefold:

* to translate the information need into a set of specific questions that the respondent can and will answer,
* to motivate, and encourage respondents to become involved, to cooperate, and to complete the questionnaire,
* to minimize response error.

### Content in a questionnaire {-}

In this step you are starting to work on the content of you questions.

At the beginning of the questionnaire you should give a brief introduction to your respondents in the context of your research and the content of the questionnaire. Try to use simple language and avoid technical terms. Additionally, in the introduction you should state how long the survey will approximately take. 

When you start thinking about the questions to ask, there are several points to consider:  

* Is the question necessary?
* Will I obtain the needed information?  
* Are several questions needed instead of one?  
* What type of data can I collect by asking that question (categorical or continuious)?  

In your survey try to avoid asking **double-barrelled questions.**Those are 
a single question that attempts to cover two issues. Such questions can be confusing to respondents and result in ambiguous responses. Instead, you might ask multiple questions in order to obtain the inteded information.  


```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
```

Do you think Nike Town offers better variety and prices than other Nike stores?    

```{block, type="incorrect", purl=FALSE}
\vspace{-0.10in}
\vspace{-0.10in}
```


```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
Correct
\vspace{-0.1in}
```    

Do you think Nike Town offers better variety than other Nike stores?  
Do you think Nike Town offers better prices than other Nike stores?

```{block, type="correct", purl=FALSE}
\vspace{-0.10in}
\vspace{-0.10in}
```
           
### Inability and unwillingness to answer {-}  

The quality of collected data you highly depends on your ability to address correct participants. Therefore, you need to make sure that your respondents are able to meaningfully answer your questions.   

Examples:  

* Not every household member might be informed about monthly expenses for groceries purchases if someone else makes these purchases.   
* Use filter questions that measure familiarity and product use.  
* Include a “don’t know” option.  
* If you ask participants for monteray values (e.g. how much are you ready to pay for the XY product?) across several EU, make sure you indicate correct currency (e.g. HRK for Croatia or HUF for Hungary).  
* Think about how mobile friendly is the layout of your survey (if it is an online survey).
* Good case practices suggest that there should not be more than 2 questions per page (for online surveys displayed on mobile phones).



If you are asking participants to recall certain brands for instance, make sure you use **unaided recall question:**  

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
Correct
\vspace{-0.1in}
```    

What brands of soft drinks do you remember being advertised on TV last night?  

```{block, type="correct", purl=FALSE}
\vspace{-0.10in}
\vspace{-0.10in}
```


```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
```  

Which of these brands were advertised last night on TV?  
a) Coca-Cola  
b) Pepsi  
c) Red Bull        
d) Evian     
e) Don’t know

```{block, type="incorrect", purl=FALSE}
\vspace{-0.10in}
\vspace{-0.10in}
```



If you are asking participants to list something, the good case practice is **to minimize the effort required by respondents:**  

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
Correct
\vspace{-0.1in}
```  

Please check all the departments from which you purchased merchandise on your most recent shopping trip to a department store:    
a) Women’s dresses  
b) Men’s apparel  
c) Children’s apparel  
d) Cosmetics  
e) Jewelry    
f) Other (please specify) ___________

```{block, type="correct", purl=FALSE}
\vspace{-0.10in}
\vspace{-0.10in}
```

```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
```  

Please list all the departments from which you purchased merchandise on your most recent shopping trip to department store X.    

```{block, type="incorrect", purl=FALSE}
\vspace{-0.10in}
\vspace{-0.10in}
```


In a case you are asking for information that could be considered sensitive (e.g. money, family life, political beliefs, religion), they should come at the end of the questionnaire. Moreover, it is recommendable to provide response categories rather than asking for specific figures:  

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
Correct
\vspace{-0.1in}
```  

Which one of the following categories best describes your household’s annual gross income?    
a) under 25.001 €    
b) 25.001€ to 50.000 €    
c) 50.001€ to 75.000 €    
d) 75.001€ to 100.000 €   
e) over 100.000 €   

```{block, type="correct", purl=FALSE}
\vspace{-0.10in}
\vspace{-0.10in}
```


```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
```  

What is your household’s exact annual income?

```{block, type="incorrect", purl=FALSE}
\vspace{-0.10in}
\vspace{-0.10in}
```

### Decide on measurement scales and scaling techniques {-}

Every statistical analysis requires that variables have a specific levels of measurement. Measurement scales you choose for your questions in a survey will affect the answers you get and eventually statistical test you can apply.
For instance, it would not make sense to compute an average of genders. An average of a categorical variable does not make much sense. Moreover, if you tried to compute the average of genders defined in numeric values (e.g. male=0, female=1), the output would be interpretable.

::: {.infobox_red .caution data-latex="{caution}"}
It is crucial to become familiar with possibilities of each scale **before** you choose to add another question to your survey. Consequently, chances to obtain data you did not intend to collect and chances that you will not be able to apply tests you intended are significantly lower.
:::

In the following table you can get a quick overview of possibilities per each measurement scale. :

```{r, echo=FALSE, out.width = '90%',fig.align='center'}
knitr::include_graphics("measurement-scale.png")
```

In the table below you can find general procedure for choosing a correct analysis based on the measurement scale of your data and number of variables. It shows statistical analyses we covered during the course and aims to help you choose among them based on the nature of dependent variables on the side, and the nature and the number of your independent variables on the other side: 

```{r, echo=FALSE, out.width = '90%',fig.align='center'}
knitr::include_graphics("overview-statistical-test.jpg")
```

::: {.infobox_red .caution data-latex="{caution}"}
It is highly recommended to think about what type of data you want to collect and what test to use, before you form a question and add to the survey. We highly recommend you NOT to add questions without thinking what type of data you are going to collect with it. If you do so, you may end up with data you did not want to collect, and moreover, with data unsuitable for the test you intended to use.

Here you can find extremely nice overview of statistical test associated with different types of variables: [CHOOSING THE CORRECT STATISTICAL TEST - UCLA](https://stats.idre.ucla.edu/other/mult-pkg/whatstat/)

:::


### The most frequent types of questions {-}

Here we want to show you the most frequent types of questions students use and what type of data can be collected by using them.

```{r, echo = FALSE, results='asis', warning=FALSE ,error=FALSE}

# Load in qualtRics package
library(qualtRics)
library(janitor)
library(sjlabelled)
library(kableExtra)

# Read the qualtrics survey data
qualtrics<-read_survey('new_qualtrics_response_data_final_final.csv')

# Using labels as column name

new.colnames <-colnames(label_to_colnames(qualtrics))
new.colnames <- make.unique(new.colnames, sep="_")
colnames(qualtrics)<- new.colnames

```

#### Multiple choice question {-}

Multiple Choice with a single answer is a type of closed-ended question that lets respondents select **one answer** from a defined list of choices.Type of data you obtain is **categorical.** 

```{r, echo=F, fig.align='center',out.width='72%', fig.cap="Multiple choice question with single answer"}
knitr::include_graphics('support-multiple-choice-question.png')
```

::: {.infobox_orange .hint data-latex="{hint}"}

Statistical test that you can think of when analysing categorical data:

* **Fisher's exact test**
    + Used when frequency in at least one cell is **less than 5 **. When frequencies in each cell are greater than 5, Chi-square test should be used
    + 1 dependent variable and  1 independent variable with 2 or more levels/factors
    + Hypothesis: Is there a significant difference in frequencies between values observed in cells and values expected in cells

* Chi-square test
    + **Goodness of fit: ** when you only have 1 dependent variable and none independent variables
        - Hypothesis: Is there a significant difference in frequencies between values observed in cells and values expected in cells ?
    + **Chi-Square Test of Independence:** when you have 1 dependent variable and  1 independent variable with 2 or more levels/factors.
        - Hypothesis: Is there an association between categorical variable X and categorical variable Y?
        
* **Binomial logistic regression**
    + Used when you have an independent variable of at least interval scale and dependent variable is a categorical variable that can take on exactly two values (1 or 0, i.e., yes or no).

* Categorical variables can be used as predictors in regression (as dummy variables).

:::


```{r, echo=F, fig.align='center',out.width='72%',fig.cap="Multiple choice question with multiple answers"}
knitr::include_graphics('multiple-choice-question-multiple-answers.png')
```

It is important to distinguish multiple choice questions with single and multiple answers (which will be presented later) as their analysis looks differently.

For the analysis of results collected with multiple choice question with multiple possible answers, we can use **Cochran's Q test.** Although we did not mention it before, it is not too different from what you have already learned about other tests. 

::: {.infobox_orange .hint data-latex="{hint}"}
The Cochran’s Q test and associated multiple comparisons require the following assumptions:

  1. Responses are dichotomous and from k number of matched samples.
  2. The subjects are independent of one another and were selected at random from a larger population.
  3. The sample size is sufficiently “large”. (As a rule of thumb, the number of subjects for which the responses are not all 0’s or 1’s, n, should be ≥ 4 and nk should be ≥ 24)
:::

#### Rank order question {-}

```{r, echo=F, fig.align='center',out.width='72%', fig.cap="Rank order question"}
knitr::include_graphics('rank-order-question.png')
```

A rank order question asks respondents to compare items to each other by placing them in order of preference. Note that the data obtained from a rank order question shows an order of a respondent's preference, but not the difference between items. For instance, if it turns out that the most important feature of a fitness tracker for a respondent XY is "Measuring steps" and the second most important feature "Calories burned", we don't know for how much more important is the former one in comparison to the latter one. 

In order to analyze results from a rank order question, we use **Friedman rank sum test.**

::: {.infobox_orange .hint data-latex="{hint}"}
Friedman rank sum test is used to identify whether there are any statistically significant differences between the distributions of 3 or more paired groups. It is used when the normality assumptions for using one-way repeated measures ANOVA are not met. Another case when Friedman rank rum test is used is when the dependent variable is measured on an ordinal scale, as in our case.
:::

#### Constant Sum question {-}

```{r, echo=F, fig.align='center',out.width='72%', fig.cap="Constant sum question"}
knitr::include_graphics('constant-sum-question.png')
```

If you wish to obtain information about how much one attribute is preferred over another one, you may use a constant sum scale. The total box should always be displayed at the bottom to make it easier for respondents. A constant sum question permits collection of ratio data type. With data obtained we would be able to express the relative importance of the options.

With the data collected we are able to answer the question: what factor is the most important for our respondents when they go out for a dinner?

In order to answer this question we need to conduct **a repeated measures ANOVA**.

::: {.infobox_orange .hint data-latex="{hint}"}
This type of ANOVA is used for analyzing data where the same subjects are measured more than once. In our case we have every respondent measured on each of the factors (locations, price, ambience and customer service). Repeated measures ANOVA is an extension of the paired-samples t-test. This test is also referred to as a within-subjects ANOVA. In the within-subject experimental design the same individuals are measured on the same outcome variable under different time points or conditions.
:::

#### Text or number entry question {-}

```{r, echo=F, fig.align='center',out.width='72%', fig.cap="Text or number entry question"}
knitr::include_graphics('images/text-entry.PNG')
```

A text or number entry question is a recommended type of question if you are interested in obtaining ratio data type. We will use this type of question together with a constant sum question type to collect data that can be analysed with regression analysis. Note that in this case we treat constant sum data as ratio data and therefore assume that 0 means complete absence.  

### Scaling techniques {-}

When it comes to scaling techniques, they are meant to study the relationship between objects. The basic scaling techniques classification is on **comparative** and **non-comparative scales**. 

```{r, echo=FALSE, out.width = '90%',fig.align='center'}
knitr::include_graphics("scales.png")
```

**The noncomparative scale** each object is scaled independently of the other objects. The resulting data is supposed to be measured in an interval and ratio scaled.

**Comparative scales (or nonmetric scaling)** compare direclty the stimulus object. For example, the respondent might be asked directly about his preference between domestic and foreign beer brands. As a result, the comparative data collected can only be interpreted in relative terms. In the following sections we will walk through both types of comparative scales and briefly introduce them.


#### Comparative scale: Paired Comparison {-}    

* Respondent is presented with two objects and asked to select one according to some criterion.
* The nature of resulting data is ordinal
* Assumption of transitivity (if X > Y and Y > Z, then X > Z) enables the paired comparison data to be converted into a rank order. To do so, you need to indetify the number of times the object is preferred by adding up all the matrices.
* Effective when the number of objects is limited as it requires the direct comparison, and a bigger number of objects makes the comparison becomes unmanagable.
* *Example:*  
For each pair, please indicate which of the two brands of beer in the pair you prefer.
```{r, echo=FALSE, fig.align='center', out.width='90%'}
knitr::include_graphics('paired comparison.png')
```

#### Comparative scale: Rank Order {-}  

* Allow a certain set of brands or products to be simultaneously ranked based upon a specific attribute or characteristic.
* The rank order scaling is a good proxy for to the shopping setting as there are simultaneous comparisons of objects.
* The rank order scaling results in the data of ordinal nature.
* *Example:*  
Rank the various brands of beer in order of preference. Begin by picking out the one brand that you like most and assign it a number 1. Then find the second most preferred brand and assign it a number 2. Continue this procedure until you have ranked all the brands of beer in order of preference.
No two brands should received the same rank number.

```{r, echo=F, fig.align='center',out.width='50%'}
knitr::include_graphics('rank-order-scale.png')
```

#### Comparative scale: Constant sum {-}  

* Respondents allocate a constant sum of units (e.g., points, dollars) among a set of stimulus objects with respect to some criterion.  
* Constant sum is similar to rank order, but it carries specific units.  
* The resulting data does not just indicate important factors, but also by how much a factor supersedes another one.  
* Constant sum scaling can be used to observe the comparative significance respondents assigned to various factors of a subject.  
* *Example:*  
There are 8 attributes of bottled beers. Please allocate 100 points among the attributes so that your allocation reflects the relative importance you attach to each attribute.

```{r, echo=F, fig.align='center',out.width='80%'}
knitr:: include_graphics('constant-sum-scale.png')
```

* Basic analysis of constant-sum data involves tabulation of responses and presenting them as either quantities (e.g., "on average, 7 points were allocated to "high alcohol level"), or, as proportions ("On average, 7% of points were allocated to "high alcohol level").  


#### Non-Comparative Scales: Continuous Rating Scales {-}  

* Participants rate the objects by placing a mark at the appropriate position on a line that runs from one extreme of the criterion variable to the other.  
* One of the advantages of the continuous rating scale is that it is easy to administer.  

```{r, echo=F, fig.align='center',out.width='70%'}
knitr::include_graphics('continuous-rating-scale.png')
```

* Once the ratings are collected, you can splits up the obtained ratings into categories and then assign those depending on the category in which the ratings fall.


#### Non-Comparative Scales: Itemized Rating Scales {-} 

* The respondents are provided with a scale that has a number or brief description associated with each category.  
* The categories are ordered in terms of scale position, and the respondents are required to select the specified category that best describes the object being rated.  
* The commonly used itemized rating scales are **the Likert, semantic differential and Stapel scales.**

##### Itemized Rating Scales: Likert scale {-}

* Requires respondents to indicate their attitude towards the given object through the degree of agreement or disagreement with each of a series of statements within typically five or seven categories.  
* Reversed code of some items increases validity.  
* One limitation is time required to answer a question on a Likert scale. Compared to other itemized scaling techniques, Likert scale is more time consuming as each respondent is required to read every statement given in a questionnaire before assigning a numerical value to it.

```{r, echo=F, fig.align='center',out.width='70%'}
knitr::include_graphics('likert.png')
```

In the table below you can find a couple of commonly measured constructs in marketing research such as attitude, importance, purchase intention and similar.

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('likert-marketing-reserach.png')
```


##### Itemized Rating Scales: Semantic Differential {-}

* Typically, participants rate objects on a number of itemized, seven-point rating scales bounded at each end by one of two bipolar adjectives.  

* Semantic differential can measure respondent attitudes towards something (products,concepts, items, people...).

* It helps you find the respondent's position is on a scale between two bipolar adjectives such as “Sweet-Sour” or “Bright-Dark”. In comparison to Likert scale, which uses generic scales (e.g. extremely dissatisfied to extremely satisfied), semantic differential questions are posed within the context of evaluating attitudes.

* Widely used rating scale in marketing research due to its versatility

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('semantic-differential.png')
```

When creating a semantical difference question, you should consider the following:

* **Number of categories:** 

```{r, echo=F, fig.align='left',out.width='72%'}
knitr::include_graphics('semantic-differential-1.png')
```

* **Balanced vs. unbalanced:**

```{r, echo=F, fig.align='left',out.width='72%'}
knitr::include_graphics('semantic-differential-2.png')
```

* **Odd/even number of categories:**

```{r, echo=F, fig.align='left',out.width='72%'}
knitr::include_graphics('semantic-differential-3.png')
```

* **Forced vs. non-forced response**

```{r, echo=F, fig.align='left',out.width='72%'}
knitr::include_graphics('semantic-differential-4.png')
```

* **Verbal description:**

```{r, echo=F, fig.align='left',out.width='72%'}
knitr::include_graphics('semantic-differential-5.png')
```



### Questionnaire structure {-}

The sequence of questions in a questionnaire could play important role. For instance, more sensitive questions (such as demographic-related questions) are usually placed at the end as they can trigger change in respondent's behavior. 

If you plan to conduct an online survey, then you need to think about the respondent's experience while doing your questionnaire. For instance, spread the content over more short pages and do not have fewer long pages. In online surveys, two questions on one page is a useful rule of thumb. Generally, respondents are reluctant to read and fill out long questionnaire pages. Hence, long pages will lead to a higher dropout rate.
In order to reduce dropout rate state how long the survey will approximately take in the introduction of the questionnaire. Take into account that tools like Qualtrics provide the estimated response time in the survey overview.

::: {.infobox_red .caution data-latex="{caution}"}
Consider that the most of people usually use their phones to fill it out. Think about how the questionnaire will appear on a phone screen too. In that regard, think of length of questions especially.
:::

In the end, the questionnaire structure has to be aligned with the research design. For example, if your research design features an experiment, this needs to be reflected in the questionnaire (e.g., you need to assign the respondents randomly to the experimental conditions in case of a between-subjects comparison).

#### Questionnaire structure for a between-subjects design {-}

In a between-subject design you randomly assign each respondent to different experimental conditions. They would then complete tasks only in the condition to which they are assigned.

For instance, we would like to test the effect of two advertisements on purchase intention. Therefore, one group of (randomly assigned) respondents will be exposed to one advertisement version while the other group (of randomly assigned respondents) will be exposed to another version. After that, both groups of respondents should express their willingness to buy the advertised product. Evenutally, if the dependent variable (e.g. willingness to buy) is measured on interval or ratio scale, then you can use independent t-test to compare group means. The whole experimental design should be organised as following:

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('between-subject-design.png')
```


#### Questionnaire structure for a within-subjects design {-}

This type of experimental design involves exposing each respondent to all of the user experimental conditions you’re testing. This way, each respondent will test all of the conditions.

For instance, we would like to test again the effect of two advertisements on purchase intentions, but this time in a within-subject design. First, each respondent will be exposed to the first version of advertisement and right after that asked to rate his/her willingness to buy the advertised product. Subsequently, each participant will be shown another version of advertisement and again rate his/her willingness to purchase the advertised product. Finally, we can compare group means with paired sample t-test (given that data is measured on interval or ratio scale). 

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('within-subject-design.png')
```


### Question wording {-}

Generally, question wording should enable each respondent to understand  questions and to be able to answer them with reliability. Reliability means that, if a respondent was asked the same question again, he/she would give the same answer again. A number of common problems regarding the question wording have been identified, so we will address the most important ones. 

In order to ensure reliability, the issue in terms of **who, what, when and where** should be defined in each question.  

```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
```    

*Example:* Which brand of shampoo do you use?  
**Who (the respondent):** It is not clear whether this question relates to the individual respondent or the respondent’s total household.  
**What (the brand of shampoo):** It is unclear how the respondent is to answer this question if more than one brand is used.  
**When (unclear):** The time frame is not specified in this question. The respondent could interpret it as meaning the shampoo used this morning, this week, or over the past year.  
**Where (not specified):** At home, at the gym? Where?
```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
\vspace{-0.1in}
```    

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
Correct
\vspace{-0.1in}
```  

*A more clearly defined question is:*  
Which brand or brands of shampoo have you personally used at home during the last month? In the case of more than one brand, please list all the brands that apply.

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
\vspace{-0.1in}
```

**Use ordinary words.** Words should match the vocabulary level of the participants.

```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
```    

“Do you think the distribution of soft drinks is adequate?”   

```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
\vspace{-0.1in}
```    


```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
Correct
\vspace{-0.1in}
```    

“Do you think soft drinks are easily available when you want to buy them?”

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
\vspace{-0.1in}
```

**Avoid double negative form**. Double negative question forms can confuse respondents, especially when they need to answer with “Agree” or “Disagree”.

```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
```

Do you think that it is not uncommon that boys play basketball?  

```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
\vspace{-0.1in}
```

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
Correct
\vspace{-0.1in}
```

In your opinion, is it common that boys play basketball?

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
\vspace{-0.1in}
```

**Avoid leading questions.**Leading questions clue the participant to what the answer should be. Such questions introduce a bias in a particular direction.  

```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
```

“Is Colgate your favorite toothpaste?”  

```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
\vspace{-0.1in}
```

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
Correct
\vspace{-0.1in}
```

“What is your favorite brand of toothpaste?”

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
\vspace{-0.1in}
```

**Avoid ambiguous words.** Words such as usually, normally, frequently, often, regularly, and other similar words, do not define frequency clearly enough.

```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
Incorrect
\vspace{-0.1in}
```

“In a typically month, how often do you go to a movie theater to see a movie?”  
a) Never  
b) Occasionally  
c) Sometimes   
d) Often   
e) Regularly  

```{block, type="incorrect", purl=FALSE}
\vspace{-0.1in}
\vspace{-0.1in}
```

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
Correct
\vspace{-0.1in}
```

"In a typically month, how often do you go to a movie theater to see a movie?"    
a) Less than once  
b) 1 or 2 times  
c) 3 or 4 times  
d) More than 4 times

```{block, type="correct", purl=FALSE}
\vspace{-0.1in}
\vspace{-0.1in}
```

### Choose adequate order {-}

One of the last steps in a process of designing a questionnaire is choosing adequate order of questions and instructions for respondents. 

At the beginning, you should provide a short and easy-to-understand introduction to the topic. Use simple language and avoid technical terms (e.g., not many people will know the terms “manufacturer brand” and “store brand”). Additionally, in the introduction you should state how long the survey will approximately take.

The opening questions should be interesting, simple and non-threatening.
They are crucial because it is the respondent's first exposure to the questionnaire and is likely to set the tone for the rest of questions in the questionnaire. If too difficult to understand, or sensitive in some way, respondents are likely to stop answering your questions. Qualifying questions (or screening questions) should serve as the opening questions (if applicable). Their purpose is to identify a potential respondent that is eligible to proceed with the research survey.

After the opening part, you should establish an optimal question flow.
General questions should precede the specific questions. Questions on one subject, or one particular aspect of a subject, should be grouped together. It may feel confusing to be asked to return to some subject they thought they already gave their opinions about.

As respondents are moving towards the end of the questionnaire, they are likely to become increasingly indifferent and might give careless answers. Therefore, questions of special importance should ideally be included in the earlier part of the questionnaire. 

Finally, you should pay particular attention to provide all prescribed definitions and explanations before you ask a question. This ensures that the questions are understood in consistent way by every respondent.

### Test your questionnaire {-}

Finally, before you distribute the final questionnaire, there are some things to consider. First, you should always pretest your questionnaire before sharing it!
Test all aspects of the questionnaire (content, wording, sequence, form & layout, etc.). If possible, use respondents in the pretest that are similar to those who will be included in the actual survey. Ideally, the pretest sample size should be small (in a real scenario this could vary from 15 to 30 respondents; for the group project, a lower number will be sufficient). After each significant revision of the questionnaire, conduct another pretest, using a different sample of respondents. Eventually, code and analyze the responses obtained from the pretest so that you make sure that you collected information you intended to collect.

After testing your questionnaire you should be able to determine whether:

* The questions are properly framed  
* The questions wording triggers any biases  
* The questions are placed in the optimal order  
* The questions are understandable  
* Specifying questions are needed or some need to be eliminated  

## Pitch, revision & submission

At this stage, you should be ready for pitching your questionnaire. Please keep in mind the timetable.

```{r eval = TRUE, echo = FALSE, warning=FALSE, message = FALSE}
library(dplyr)
library(kableExtra)
mytable_sub = data.frame(
    Date_A = c("Oct. 21", 
             "Oct. 23*", 
             "Nov. 1"
             ),
    Time_A = c(
      "11:59PM","09:00AM - 02:30PM","11:59PM"
    ),Date_B = c("Oct. 25", 
             "Oct. 27*", 
             "Nov. 4"
             ),
    Time_B = c(
      "11:59PM","02:00PM - 08:00PM","11:59PM"
    ),
    Task = c(  "* Submit questionnaire draft", 
               "* Coaching: Questionnaire design (live video coaching)", 
               "* Submit revised questionnaire"
               ),
    Chapters = c("10","10","10"),
    Link = c("",
                 "TBC", 
                 ""
                 )
    )

#pander::pander(mytable_sub, keep.line.breaks = TRUE, style = 'grid', justify = 'left')
mytable_sub %>% kable(escape = T) %>%
  kable_paper(c("hover"), full_width = F) %>%
  footnote(general = "Dates and times are indicated for groups A and B respectively.
           Sessions indicated with '*' are group coaching sessions. Slots of 45 min. are assigned to each group within the indicated times.",
           general_title = "Note: ", 
           footnote_as_chunk = T, title_format = c("italic")
           ) 
#%>%   row_spec(c(1,3,6), background = "#E0E0E0")
```


# Part 2: Collecting data and analysis {-}

## Data analysis {-}

```{r, echo = FALSE, results='asis', warning=FALSE ,error=FALSE}

# Load in qualtRics package
library(qualtRics)
library(janitor)
library(sjlabelled)

# Read the qualtrics survey data
qualtrics<-read_survey('new_qualtrics_response_data_final_final.csv')

# Using labels as column name

new.colnames <-colnames(label_to_colnames(qualtrics))
new.colnames <- make.unique(new.colnames, sep="_")
colnames(qualtrics)<- new.colnames

```

In this chapter we will encounter the nature of data you collect when conducting a survey. It will help you choose a type of a question depending on the nature of data you want to collect and on the type of statistical tests you want to apply.


### Multiple choice with a single answer {-}

Multiple Choice with a single answer is a type of closed-ended question that lets respondents select **one answer** from a defined list of choices.

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('support-multiple-choice-question.png')
```

Type of data you obtain is **categorical**, and the output comes in the following form:  

```{r, echo=FALSE,warning=FALSE, error=FALSE, fig.align='center',eval=FALSE}
qualtrics[1:6,c("In a typical week, how many hours do you spend watching movies or TV series on Netflix?")] %>%
  kableExtra::kbl(align = "c") %>%
  kable_paper("hover", full_width = F)
```

What to do with this data now? First, we need to load it in R and prepare for analysis. The numbers you see in the output R recognizes **as numeric**. In order to conduct statistical modeling and properly visualize our results, we need to convert our data to **a factor class.**

A factor (or coding variable) represents different groups of data by using numbers (integers). In fact, factors appear as numeric variables, but they hold meaning of labels/names of data groups, i.e. nominal variable. These data groups are represented in a form of 'levels'.  
In our case, our multiple choice question output will contain 4 data groups ('Grocery Store', 'Online shop', 'Specialised coffee shop', 'other') after converting it to factor:

```{r, eval=TRUE, warning=FALSE, message=FALSE}
# Convert numeric value to factors
qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?' <- factor(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?', levels = c(1:5), labels = c('Never','1-2 hours','3-4 hours','5-6 hours','more than 6 hours'))

qualtrics$` Selected Choice_1` <- factor(qualtrics$` Selected Choice_1`,levels = c(1:2),labels = c("Male","Female"))

qualtrics$` Selected Choice` <- factor(qualtrics$` Selected Choice`, levels = c(1:2), labels=c("Austria","Germany"))


# Table
table(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?')
table(qualtrics$` Selected Choice`)     #countries
table(qualtrics$` Selected Choice_1`)   #gender

```

Second, you might want to visualize your results. In order to do so, the data format needs to be in the appropriate format.Here we proceed with data format adaptation from the point where we stopped:

```{r, eval=TRUE, warning=FALSE, message=FALSE}
# Converting long format to the visualisation-friendly format
mlc_visualisation <- as.data.frame(table(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?'))

# Naming columns
names(mlc_visualisation) <- c('Time','Count')

# Observing
knitr::kable(mlc_visualisation)

```

The simplest way to visualize data obtained from multiple choice question with a single answer is **a bar chart**:

```{r}
## Basic bar chart
labels <- as.character(mlc_visualisation$Time) #Save labels for x-axis in the barplot
barplot(mlc_visualisation$Count, # Column to visualize
        xlab='Time', # X-axis label
        ylab = 'Count(answers)', # Y-axis label
        names.arg = labels,
        main = 'How many hours do you spend watching movies or series on Netflix?') # Title
```

R package **ggplot2** allows you to create visually appealing graphs:

```{r}
## ggplot2 bar chart
library(ggplot2)
p <- ggplot(data=mlc_visualisation, 
             aes(x=Time, y=Count, fill=Time)) +
             geom_bar(stat='identity') + theme_minimal() + labs(title = "In a typical week, how many hours do you spend watching movies or series on Netflix?")
p
```

Another R library which can help you make amazing interactive charts in a minute is **plotly**. Here we use a function called **ggplotly()**, which allows you to turn any **ggplot2** chart interactive. Since we have already created a bar chart using ggplot2 and saved it as "p", we will just turn it into plotly graph:

```{r,warning=F,message=F}
## ggplotly bar chart

library(plotly)
ggplotly(p)
```


An improved version of ggplot2 package is the packaged called **ggvis**, which is still in developing:
```{r, warning=FALSE, message=FALSE}
## ggvis bar chart

library(ggvis)
ggvis(mlc_visualisation, 
      x = ~Time, 
      y = ~Count, 
      fill=~Time)
```



Data type collected from the previous question is ordinal as we are able to make a natural order of the levels. Since it is ordinal data type, it belongs to categorical data. For the analysis of categorical data we can use Chi-square test or Fisher's test if a count for some level is less than 5. 

#### Fischer's exact {-}

Fisher's exact test is used to test a hypothesis with data obtained from multiple choice questions with single answer. Results from multiple choice questions with multiple answers are treated with different test.
<ul><li> <B> Application: </B> when you have <B> 1 dependent variable and  1 independent variable with 2 or more levels/factors </B></ul></li>
<ul><li> Used when frequency in at least one cell is <B> less than 5 </B>. When frequencies in each cell are greater than 5, Chi-square test should be used.</ul></li>
<ul><li> <B>Hypothesis:</B> Is there a significant difference in frequencies between values observed in cells and values expected in cells ? (R for Marketing and Research Analytics)</ul></li>
<ul><li> <B>H0:</B> There is no relationship between the two categorical variables.Therefore, two categorical variables are <B> independent.</B> Knowing the value of one variable does not help to predict the value of the other variable.</ul></li>
<ul><li> <B>H1:</B> There is a relationship between the two categorical variables.Therefore, two categorical variables are <B> dependent.</B>Knowing the value of one variable helps to predict the value of the other variable.</ul></li>
<ul><li> Usually, this type of test is used on 2x2 contingency tables. However, it can be applicable on contingency tables of larger dimensions.</ul></li>

<B>Example:</B> We would like to know whether a number of hours spent watching Netflix depends on the respondents' country of origin.


```{r}
# Creation of contingency table
fisher_test_table <-table(qualtrics$` Selected Choice`,qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?')
# Check how our contigency table looks like
fisher_test_table

# Since we have a count less than 5, we should apply Fisher's test instead of Chi-square.

# Fisher's test
test <- fisher.test(fisher_test_table)
test

# p-value
test$p.value
```

From the output and from test$p.value we see that the p-value is higher than the significance level of 5%. Like any other statistical test, if the p-value is higher than the significance level, we can not reject the null hypothesis.

In our case, not rejecting the null hypothesis for the Fisher’s exact test of independence means that there is no significant relationship between the two categorical variables. Therefore, knowing the value of one variable does not help to predict the value of the other variable.

#### Chi-square test: Goodness of fit & Independence test {-}

1) Goodness of fit
<div><ul><li><B> Application: </B>when you only have <B> 1 dependent variable and none independent variables </B></ul></li>
<ul><li> <B> Hypothesis:</B> Is there a significant difference in frequencies between values observed in cells and values expected in cells ? </ul></li>
<ul><li> <B> H0: </B> There is no significant difference between the observed and the expected frequencies.</ul></li>
<ul><li> <B> H1: </B> There is a significant difference between the observed and the expected frequencies. </ul></li>
<ul><li> If we don't specify expected frequency per cell (see in the code below), then it is expected that all cells show an eqaul frequency. </ul></li>
<ul><li> <B> Example</B> :'Do the numbers of respondents who are spending different amount of hours watching Netflix <B> significantly differ from each other?</B>'</ul></li></div>
<ul><li><B> Note that we did not assume any specific distribution, so we are assuming that each count will have the same or similar number. </ul></li></B>

```{r} 
# Creating table 
(mlc_chi_square <- table(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?'))
      
# Chi-square test (without given expected values = equal values )
chisq.test(mlc_chi_square)
```

The p-value of the test is higher than 0.05. We can conclude that the numbers of respondents who spent different amount of hours watching Netflix are commonly distributed. Observed distribution does not differ significantly from the expected. This result does not surprise if you take a look at the values for each level in the table we created before conducting the test. There you can see that count of answers in each level is more or less not deviating too much. It is visible if you take a look at the previous visualisations as well.


If we are interested in testing more specific distribution, i.e. expect that 40% of our respondents are watching Netflix 3-4 hours, we can introduce corresponding distribution in the test. 

```{r}
# Expected values in percentages for each alternative. The sum must be 1.
expected_values <- c(0.10, # We expect that 10% of our respondents do not watch Netflix at all ("Never").
                     0.20, # We expect that 20% of our respondents watch Netflix 1-2 hours a week.  
                     0.40, # We expect that 40% of our respondents watch Netflix 3-4 hours a week.
                     0.20, # We expect that 20% of our respondents watch Netflix 5-6 hours a week.
                     0.10 # We expect that 10% of our respondents watch Netflix more than 6 hours a week.
                    )
# Chi-square test with expected values
chisq.test(mlc_chi_square, p=expected_values)
```

This time the p-value of the test is lower than 0.05.We have an evidence that observed distribution does significantly differ from the expected distribution (10%/20%/40%/20%/10%).  


2) Chi-Square Test of Independence
<div><ul><li> <B> Application: </B>when you have <B> 1 dependent variable and  1 independent variable with 2 or more levels/factors </B></ul></li> 
<ul><li> <B> Hypothesis: </B> Is there an association between categorical variable X and categorical variable Y? </ul></li>
<ul><li> <B> H0: </B> There is no association between the two variables.</B></ul></li>
<ul><li> <B> H1: </B> There is an association between the two variables.</B></ul></li>
<ul><li> <B> Example: </B> Is there an association between gender and the hours spent watching Neflix during a week? </ul></li></div>

```{r}
# Creation of contingency table
chi_square_table <-table(qualtrics$` Selected Choice_1`,qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?')

# Chi-square independence test
chisq.test(chi_square_table)
```

Since the p-value (0.8135) is higher than the significance level (0.05), we cannot reject the null hypothesis. Thus, we conclude that there is no association relationship between gender and number of hours spent watching Netflix. Therefore, we can say that the hours spent is independent from the gender of participant.

### Multiple choice with multiple answers

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('multiple-choice-question-multiple-answers.png')
```

Before we conduct any test, we will do some simple calculations and visualise our data. 

```{r}
# Rename columns
colnames(qualtrics)[38] <- "ja!Naturlich"
colnames(qualtrics)[39] <- "Clever"
colnames(qualtrics)[40] <- "Spar Vital"
colnames(qualtrics)[41] <- "..."

# Replacing NA with 0
qualtrics$`ja!Naturlich`[is.na(qualtrics$`ja!Naturlich`)]=0
qualtrics$Clever[is.na(qualtrics$Clever)]=0
qualtrics$`Spar Vital`[is.na(qualtrics$`Spar Vital`)]=0
qualtrics$...[is.na(qualtrics$...)]=0

# Calculating frequency, percentage of respondents and percentage of cases
df.cochran <- data.frame(Frequnecy = colSums(qualtrics[38:41]),
                         Share_of_respondents = (colSums(qualtrics[38:41])/sum(qualtrics[38:41]))*100,
                                Share_of_cases =((colSums(qualtrics[38:41]))/nrow(qualtrics[38:41]))*100)
# Observing
df.cochran

# Visualisation
barplot(df.cochran[,3], names.arg = row.names(df.cochran), main = "% of Respondents familiar with brands", xlab = "Brand",ylab = "Percentage")
```

The visualisation above depicts the fact that more than 60% percent of people are familiar with the brand "ja!Naturlich", while we can not say the same for other brands considered in our question. 


For the analysis of results collected with multiple choice question with multiple possible answers, we can use **Cochran's Q test.**Although we did not mention it before, it is not too different from what you have already learned about other tests. 

The Cochran’s Q test and associated multiple comparisons require the following assumptions:
1. Responses are dichotomous and from k number of matched samples.
2. The subjects are independent of one another and were selected at random from a larger population.
3. The sample size is sufficiently “large”. (As a rule of thumb, the number of subjects for which the
responses are not all 0’s or 1’s, n, should be ≥ 4 and nk should be ≥ 24)

In a within-subjects experiment design with three or more observations of a dichotomous(= just two levels such as "Yes" or "No") categorical outcome, you utilize Cochran's Q test to assess main effects.Similarly, in our multiple choice question with multiple answers we have the same respondent going through three or more potential answers with dichotomous(=yes or no) categorical outcome. 


```{r}
library(nonpar)

# Creation of matrix
#matrix.cochran <- cbind(qualtrics$`ja!Naturlich`,
#                   qualtrics$Clever,
#                   qualtrics$`Spar Vital`,
#                   qualtrics$`...`)
# Turning NAs to 0
#matrix.cochran[is.na(matrix.cochran)]=0

# Cochran test                   
#cochrans.q(matrix.cochran, alpha = 0.05)

```

The p-value less than 0.05 indicates that there is enough evidence to conclude that some of the store brands are better known among our respondents than other. In order to take a closer look at it, we need to conduct a post hoc test.

```{r}
library(DescTools)
list.cochran <- list(qualtrics$`ja!Naturlich`,
                   qualtrics$Clever,
                   qualtrics$`Spar Vital`,
                   qualtrics$...) # imaginary brand

# Replacing NAs in the list with 0 in order to be able to run the test
list.cochran <- rapply(list.cochran, f=function(x) ifelse(is.na(x),0,x), how="replace" )

# Post hoc test (Dunn Test)
DunnTest(list.cochran, method="bonferroni")

```

From the results of the Dunn Test, we can see that there is a big difference between 1 ("ja!Natürlich") and 4("..."), as well as between 4("...") and 3("Spar Vital"). 

### Rank order question {-}

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('rank-order-question.png')
```

A rank order question asks respondents to compare items to each other by placing them in order of preference. Note that the data obtained from a rank order question shows an order of a respondent's pereference, but not the difference between items. For instance, if the the most important feature of a fitness tracker for a respondendt XY is "Measuring steps" and the second most important feature "Calories burned", we don't know for how much more important is the former one in comparison to the latter one.

Intuitive question to ask is the following: which feature of the fitness tracker is the most important for our respondents?

We can answer this question by calculating a mean rank for each feature. Before we do so, we will create a separate data frame and add columns of the response data.
```{r}
rank.data <- data.frame(qualtrics$` Measuring steps`,
                        qualtrics$` Calories burned`,
                        qualtrics$` Measuring heartbeat`,
                        qualtrics$` Exercise tracking`,
                        qualtrics$` Measuring distance`)
colnames(rank.data)<-c("Measuring steps","Calories burned","Measuring heartbeat","Exercise tracking","Measuring distance")

```

First information we would like to know is how many preference combinations there are, and how repetitive they are. We can obtain that information by creating a summary of the ranking data frame we created. 
```{r}
library(pmr)
test <- rankagg(rank.data)
test
```

The matrix we received as an output is the summary of our ranking data. It shows that, for instance, the preference combination "2,1,3,4,5" repeats 10 times in the data frame. More specifically, it means that there are 10 respondents who prefer the item 2("Calories burned") the most, then the item 1("Measuring steps"), and so on.

Now we can calculate the mean rank for each feature and conclude which feature is the most important to our respondents:

```{r}
# Mean rank of each fitness tracker feature
destat(test)$mean.rank
```

As we can observe from the output, the item 1("Measuring steps") shows the best mean rank among all items. Therefore, we can assume that the "Measuring steps" is most important for our respondents. However, in order to statistically prove it and become sure that this is not just by mere chance, we can conduct **Friedman rank sum test**.

Friedman rank sum test is used to identify whether there are any statistically significant differences between the distributions of 3 or more paired groups. It is used when the normality assumptions for using one-way repeated measures ANOVA are not met. Another case when Friedman rank rum test is used is when the dependent variable is measured on an ordinal scale, as in our case.

Before we conduct the Friedman rank sum test, we will visualise our data:
```{r,echo=FALSE}
# Preparing data frame for Friedman rank sum test
library(reshape2)
library(ggpubr)
library(rstatix)
library(ggstatsplot)
```


```{r}
# We have just turned our data frame from the wide format to the long format by using function melt(). If we take a look at head and tail of our new data frame, we can see that it contains just two columns, "Rank" and "Feature".

rank.data.long <- reshape2::melt(rank.data,value.name = "Rank",variable.name = "Feature", stringsAsFactors=TRUE)

tail(rank.data.long)
head(rank.data.long)

# Visualisation
ggstatsplot::ggwithinstats(
  data = rank.data.long,
  x = Feature,
  y = Rank,
  type = "np",
  pairwise.comparisons = TRUE, # show pairwise comparison test results
  title = "What features are important to you when evualting fitness trackers?")
```

Already from the advanced visualisation, that includes Friedman rank sum test and pairwise comparison, we can have an insight in significance of differences among features.  

```{r}
# Friedman test 
friedman.test(as.matrix(rank.data))
```

Friedman rank sum test has a p-value lower than 0.05, so we can conclude that here are significant differences between at least two features (what we have already seen in our visualisation). Even though we have identified differences between preferences towards features in our advanced visualisation, we will conduct a post hoc test in order to demonstrate traditional way of calculating pairwise comparisons.


```{r}
knitr::kable(wilcox_test(Rank ~ Feature, paired = TRUE, p.adjust.method = "bonferroni", data = rank.data.long))
```
The output table provides us with p-values referring to significance of difference in mean ranks of each pair. For instance, the first 4 rows  proves that the differences between the mean rank of the feature "Measuring steps" and each of the rest of features are significant. Consequently, we can conclude that this feature is by far the most important among our respondents. 


Another question that may be interesting to explore is whether there are any complementary features ? Or features which overlap each other in its functionality? In order to have a look at that, we can investigate the correlation between ranks assigned to each feature.
```{r}
#Correlation Matrix
cor.matrix<-cor(rank.data, method=c('spearman'))
cor.matrix
```

At the first glance we can observe a lot of negative values, meaning that many features correlate negatively relative to each other. In order to make the interpretation easier, we will try to visualise correlations in a form of a correlation matrix.

```{r}
library(ggcorrplot)
ggcorrplot(cor.matrix)
```

From the correlation matrix we can confirm that almost all features negatively correlate to each other. An exception is the relationship between feature "Measuring steps" and "Exercise tracking", which correlates positively. This matrix can be useful for digging deeper in relationship between preferences for features. For instance, we can assume that feature "Measuring steps" and "Exercise tracking" correlate positively because users see them as complementary features. Moreover, if we say that walking is a type of exercise (in case of longer walking routes), we can assume that users, who ranked "Exercise tracking" high, ranked "Measuring steps" high as well, because they perceive it as another type of "Exercise tracking".

### Constant Sum question {-}

```{r, echo=F, fig.align='center',out.width='72%'}
knitr::include_graphics('constant-sum-question.png')
```

If you wish to obtain information about how much one attribute is preferred over another one, you may use a constant sum scale. The total box should always be displayed at the bottom to make it easier for respondents. A constant sum question permits collection of ratio data type. With data obtained we would be able to express the relative importance of the options.

```{r, echo=FALSE,warning=FALSE, error=FALSE, fig.align='center'}
constant.sum <- subset(qualtrics, select =c(" Location"," Price"," Ambience"," Customer Service"))
constant.sum$id <- seq(1:nrow(constant.sum))
knitr::kable(constant.sum[1:6,], caption = "Constant Sum Question")
```

```{r}
# Compute descriptive statistics
library(pastecs) 
res <- stat.desc(constant.sum)
round(res[,1:4],2)
```

```{r,error=FALSE,warning=FALSE, message=FALSE}
# Creation of the long version of data frame
constant.sum.long <-melt(constant.sum[,-5], variable.name ="Factor" ,value.name = "Points")
constant.sum.long
```


```{r,error=FALSE,warning=FALSE, message=FALSE}
# Boxplot ggplot2
p<-constant.sum.long %>% 
  filter(Factor!="id") %>%
  ggplot(aes(x=Factor, y=Points, fill= Factor)) +
    geom_boxplot()  +
    theme_minimal() +
    ggtitle("What factors do you consider when choosing a place to go for a dinner?") +
    xlab("")
ggplotly(p)
```

With the data collected we are able to answer the question: what factor is the most important for our respondents when they go out for a dinner?

```{r}
library(robCompositions)
constSum(constant.sum,100)[,-5]
```

In order to anwser this question we need to conduct **a repeated measures ANOVA**.
This type of ANOVA is used for analyzing data where the same subjects are measured more than once. In our case we have every respondent measured on each of the factors (locations, price, ambience and customer service). Repeated measures ANOVA is an extension of the paired-samples t-test. This test is also referred to as a within-subjects ANOVA. In the within-subject experimental design the same individuals are measured on the same outcome variable under different time points or conditions.

We need to check all assumptions that need to be fulfilled in order to deploy this type of ANOVA. There are three assumputions that need to check. The first to check that each level of the independent variable is approximately normally distributed. Since we have more than 30 observations at each level, we do not need to proceed further due to the central limit theorem. Second assumption referrs to extreme outliers. Let's have a look at potential outliers:

```{r}
# Outliers
constant.sum.long %>% 
  group_by(Factor) %>%
  identify_outliers(Points)
```

As we cannot identify any extreme outliers, we can proceed with deploying repeated measures ANOVA.

```{r}
# Formatting data 
constant.sum.aov <- gather(constant.sum, key = "Factor", value = "Points", ` Location`,` Price`,` Ambience`,` Customer Service`)

# One-way repeated measures ANOVA  
res.aov <- anova_test(data = constant.sum.aov, dv = Points,wid = id ,within = Factor)
get_anova_table(res.aov)

# Post hoc test
pairwise.t.test(constant.sum.long$Points,constant.sum.long$Factor, paired = T, p.adjust.method = "holm")
```
Now we can clearly see that our respondents consider price more than location, or ambience, while customer service is perceived almost equally important as prices.

```{r}
ggstatsplot::ggwithinstats(
  data = constant.sum.long %>% filter(Factor!="id"), # excluding "id" column from the data
  x = Factor,
  y = Points,
  type = "p",
  pairwise.comparisons = TRUE, # show pairwise comparison test results
  title = "What factors do you consider when choosing a place to go for a dinner?")
```

### Text or number entry question {-}

```{r,echo=FALSE}
set.seed(1234567)
qualtrics$` Willingness-to-pay (in EUR)`<- abs(as.integer(rnorm(n = 117,mean=23,sd=40)))
qualtrics$` Customer Service` <- abs(as.integer(runif(n=117,min=0,max = 100)))
```


A text or number entry question is a recommended type of question if you are interested in obtaining ratio data type. We will use this type of question together with a constant sum question type to collect data that can be analysed with regression analysis. Note that in this case we treat constant sum data as ratio data and therefore assume that 0 means complete absence.  


Here is a glimpse in answers on how important is each factor to our respondents when it comes to dinning outside:
```{r, echo= FALSE}
knitr::kable(qualtrics[1:6,c(" Location"," Price"," Ambience"," Customer Service")], caption = "Constant sum question")
```

Additionally, we asked our respondents how much are they willing to spend on dinner on average. In order to handle data easier, we will create a new data frame where we merge all the data together:
```{r}
dinner <- subset(qualtrics, select = c(" Location"," Price"," Ambience"," Customer Service", " Willingness-to-pay (in EUR)"))
knitr::kable(head(dinner))
```
Before we conduct a linear regression analysis, we need to take a look at correlation matrix:  

```{r}
correlation <-cor(dinner, method=c('pearson'))
correlation
```
From our data we see, for instance, that some negative correlation between willingness to pay and importance of ambiance as well as some positive correlation between importance of customer service and willingness-to-pay. Let us observe descriptive statistics as well:  
```{r}
knitr::kable(psych::describe(dinner))
```

We see that difference between mean and median does not suggest (at the first sight) great effect of outliers.
Let us now do linear regression analysis:
```{r}
mlr.dinner <- lm(` Willingness-to-pay (in EUR)` ~ ` Location` + ` Price` + ` Ambience`+` Customer Service`, data = dinner)
summary(mlr.dinner)
```

```{r, echo=FALSE}
coeff <- summary(mlr.dinner)
```

Out of all factors of importance when dinning out, the only one that suggests significance at 0.05 level of significance is ambience. From the summary we can conclude that increase in importance of ambience by 1 point, leads to decrease in willingness to pay by `r summary(mlr.dinner)$coefficients[4,1]`.

```{r}
confint(mlr.dinner)
```

From confidence intervals, We can conclude that when we do not consider any of given factors (location, price, ambience and customer service), willingness to pay  will be somewhere between `r confint(mlr.dinner)[1,1]`EUR and `r confint(mlr.dinner)[1,2]`EUR. Besides that, for each increase in importance of dinner ambiance by one point, there will be an average decrease of willingness to pay between `r confint(mlr.dinner)[4,1]` and `r confint(mlr.dinner)[4,2]`.

```{r, warning=FALSE, error=FALSE}
ggcoefstats(x = mlr.dinner,
            title = "Willingness to pay predicted by importance of factors")
```


There are couple of things we need to consider when we do multiple linear regression. One of them are potential outliers in our data. Here we identify and visualize them:

```{r}
# Outliers
outlier_values <- boxplot.stats(mlr.dinner$residuals)$out  # outlier values.
outlier_values
```

We identified observations that belong to outlier values. We can even visualize them too:

```{r}
boxplot(mlr.dinner$residuals, main="Willingnes to pay", boxwex=0.1)
```

In addition, we need to observe whether there are any influential observations:

```{r}
plot(mlr.dinner,4)
```

A rule of thumb to determine whether an observation should be classified as influential or not is to look for observation with a Cook’s distance > 1 .We see from the graph that there are no influential observations.


Another thing to consider is linearity, i.e. that the relationship between the dependent and the independent variable can be reasonably approximated in linear terms:

```{r}
# Linear specification
library(car)
avPlots(mlr.dinner)
```

In our example it does not seem that linear relationships can be reasonably assumed for all variables.

As we already learned, another important assumption of the linear model is that the error terms have a constant variance (i.e., homoscedasticity):
```{r}
# Breusch-Pagan Test
library(lmtest)
bptest(mlr.dinner)
```

The null hypothesis for this test is that the error variances are all equal, and our result is insignificant. Therefore, this assumption is met. 

Another assumption to be met is that the error term is normally distributed. One way to check for normal distribution of the data is to employ statistical with the null hypothesis that the data is normally distributed. One of these is a Shapiro–Wilk test:

```{r}
shapiro.test(resid(mlr.dinner))
```

When the assumption of normally distributed errors is not met (as it is not met in our case), this might again be due to a misspecification of your model, in which case it might help to transform your data.


Finally, we need to check for multicollinearity, the case when there is a strong linear relationship between the independent variables:

```{r}
correlation <-cor(dinner, method=c('pearson'))
correlation
```

By observing our correlation matrix, we can see that non of the coefficients suggest values close to 0.8 or 0.9. Consequently, we conclude that there are no concerns regarding the multicolinearity between independent variables.


## Reporting {-}

 inttor



## Presentation guidelines & grading {-}

Your performance in this part will be evaluated based on the following criteria:

* **Individual Responsibility:**

    + Group members should plan to share presentation responsibilities and field questions equally.
    + All members of the group must contribute to the presentation.
    + Individual grade for presentation and oral participation during the class.
    + To ensure an equal contribution of group members, a peer assessment will be conducted, which enters into the computation of the individual grades for the group project. 

* **Quality of questionnaire design:**

    + Survey method
    + Question structure / wording

* **Data Analysis:**

    + Clarity / appropriateness
    + Completeness / accuracy
    
* **Presentation:**

    + Introduction/problem, approach, solution/inclusion



### Final presentation and submission {-}


```{r eval = TRUE, echo = FALSE, warning=FALSE, message = FALSE}
library(dplyr)
library(kableExtra)
mytable_sub = data.frame(
    D = c("Nov. 16*",
             "Nov. 23*",
             "Dec. 7"
             ),
    Time_A = c(
      "01:30PM - 04:30PM","01:30PM - 06:30PM","11:59PM"
    ),Date_B = c("Nov. 18*",
             "Nov. 25*",
             "Dec. 9"
             ),
    Time_B = c(
      "02:00PM - 05:00PM","03:00PM - 08:00PM","11:59PM"
    ),
    Task = c(  "* Coaching: Data handling (live video coaching)",
               "* Coaching: Data analysis (live video coaching)",
               "* Submit video recording of presentation (pre-recorded)"
               ),
    Chapters = c("","",""),
    Link = c(    "TBC",
                 "TBC",
                 ""
                 )
    )

#pander::pander(mytable_sub, keep.line.breaks = TRUE, style = 'grid', justify = 'left')
mytable_sub %>% kable(escape = T) %>%
  kable_paper(c("hover"), full_width = F) %>%
  footnote(general = "Dates and times are indicated for groups A and B respectively.
           Sessions indicated with '*' are group coaching sessions. Slots of 45 min. are assigned to each group within the indicated times.",
           general_title = "Note: ", 
           footnote_as_chunk = T, title_format = c("italic")
           ) 
#%>%   row_spec(c(1,3,6), background = "#E0E0E0")
```








